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Preface 


Machine learning (ML) has been around for many decades and has been explored 
in the past for many practical applications. Currently, ML is interpreted in a broader 
context and finding its way into a number of sectors such as Engineering, Health 
Care, Transport including Traffic Prediction and Control, driverless car, Information 
Technology, Big Data Analysis and Processing, Agriculture, Agronomy, etc. It has 
found its way also into our daily life, for example, temperature and lighting controls 
and information searches on the internet. In a nutshell, ML is nothing but statistical 
interference using data collected or knowledge gained through past targeted studies 
or real-life experiences. The sophistication level of ML depends on the intended 
application and the advanced nature of the algorithms used for statistical learning 
and inference. This area has attracted huge interest recently because of the advent 
of the computational power, technology and algorithms required for data training, 
verification and validation, and the readiness and availability of these algorithms for 
application to a wide range of fields and practical systems. Hence, it is very timely 
to overview the various ML techniques or algorithms for big data analyses with a 
specific application to combustion science and technology. 

This particular topic is chosen because of the important role of combustion systems 
and technologies covering more than 90% of the world’s total primary energy supply 
(TPES). Although alternative renewable energy technologies are coming up, their 
shares for the TPES are less than 5% currently and one needs a complete paradigm 
shift to replace combustion sources. Whether this is practical or not is entirely a 
different question and an answer to this question is likely to depend on the respon- 
dent. However, a pragmatic analysis suggests that the combustion share to TPES 
is likely to be more than 70% even by 2070 as discussed in the chapter “Introduc- 
tion” of this book. Hence, it will be prudent to take advantage of ML techniques to 
improve combustion sciences and technologies to better combustion system design 
and development so that the emission of greenhouse gases can be curtailed along with 
improving overall efficiencies. The level of interest in applying ML to combustion is 
clearly evident from the recent surge in research activities on this topic. Hence, the 
aim of this volume is to bring this knowledge together and make it readily accessible 
for researchers and graduate students interested in this multi- and cross-disciplinary 
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topic. We attempted to keep the discussion accessible to students and researchers 
interested in turbulent combustion, ML techniques, and its application to turbulence 
and combustion on a simple physical basis while highlighting the need for ML. 

Chapter “Introduction” gives an introduction to the role of combustion technolo- 
gies in the future purely based on the current practical and scientific evidence. This 
chapter also identifies the opportunities to use ML algorithms (MLA) while investi- 
gating turbulent combustion. The chapter “Machine Learning Techniques in Reactive 
Atomistic Simulations” surveys various ML techniques and discusses their applica- 
tion for estimating atomic potential energies, required for chemical kinetics, through 
molecular dynamics simulation as an example. The chapter “A Novel In Situ Machine 
Learning Framework for Intelligent Data Capture and Event Detection” introduces 
in situ training for MLA which is a useful idea as it can save considerable efforts 
required in the training phase while using MLA. The chapter “Machine-Learning 
for Stress Tensor Modelling in Large Eddy Simulation” discusses the use of ML 
to estimate subgrid scale stresses and fluxes needed for large eddy simulation of 
turbulent combustion. The application of ML for combustion chemistry is discussed 
in the chapter “Machine Learning for Combustion Chemistry”. The turbulence- 
chemistry interaction is a highly nonlinear stochastic problem ideally suited for 
ML and chapters “Deep Convolutional Neural Networks for Subgrid-Scale Flame 
Wrinkling Modeling—AI Super-Resolution: Application to Turbulence and Combus- 
tion” give different perspectives on the use of ML for estimating filtered reaction 
rate. Data-driven approaches can also be leveraged for reduced-order modeling of 
turbulent combustion and this is discussed in the chapter “Reduced-Order Modeling 
of Reacting Flows Using Data-Driven Approaches”. The use of ML for thermoa- 
coustics is described in chapter “Machine Learning for Thermoacoustics”. Some of 
these chapters are written in a tutorial fashion and also provide hyperlinks to access 
the associated computer codes. The concluding remarks and future directions are 
summarised in the final chapter. Each of the chapters provides ample references for 
further reading by curious readers. 

The idea for this book came during a collaborative project, ALCHEMY (mAchine 
Learning for ComplEx MultiphYsics problems), between Cambridge University and 
ULB funded by Fondation Wiener-Anspach, ULB, Brussels. The funding from this 
foundation is gratefully acknowledged. We cannot understate the dedication of the 
contributors to this volume and we thank them for their contributions. 


Cambridge, UK Nedunchezhian Swaminathan 
Brussels, Belgium Alessandro Parente 
May 2022 
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Abstract The annual data published by IEA is analysed to get a projection for the 
combustion share in total primary energy supply for the world. This projection clearly 
identifies that more than 60% of world total primary energy supply will come from 
combustion based sources even in the year of 2110 despite an aggressive shift towards 
renewables. Hence, improving and searching for greener combustion technologies 
would be beneficial for addressing global warming. Computational approaches play 
an important role in this search. The large eddy simulation equations are presented and 
discussed. Potential terms which are amenable for using machine learning algorithms 
are identified as a prelude to later chapters of this volume. 


Combustion is a socio-economically important topic for many tens of centuries and it 
still remains to be so because more than 90% of the world’s total primary energy sup- 
ply (TPES) is met through combustion in one form or another, see IEA (2021). Even 
the recently proposed changes towards low carbon or carbonless fuels, including 
E-fuels, will involve some sort of combustion employing concepts and technologies 
which could be substantially different from those used currently. Figure 1 shows the 
share of various sources for TPES which is about 606 EJ for the year 2019. This is 
nearly 139% of the energy used in 1973 which suggests about 3% increase per year 
over the past 46 years and this is inline with an estimate of about 40% increase in the 
global energy consumption for the next two decades by the National Academies of 
Science, Engineering and Medicine, see How we use energy (2022). This projected 
energy demand is likely to be larger because of the widespread use of energy-hungry 
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consumer electronics and other technologies such as Internet of Things (IoTs), elec- 
tric vehicles, etc. While these technologies bring their own advantages one cannot 
deny their impacts on the environment arising from their manufacturing, end-of-life 
treatments and more importantly higher demand for energy during their lifetime 
leading to global warming related issues. Indeed, the use of energy-hungry modern 
technologies and mitigation of global warming are at the opposite ends and bringing 
them together is a grand challenge requiring carefully constructed solutions. 

The global temperature is expected to rise in the next 100 years according to the 
intergovernmental panel reports—Future climate changes, risks and impacts (2022), 
and as discussed by Hayhoe et al. (2017). If the emission of green house gases (GHG) 
follow a particular Representative Concentration Pathway (RCP 2.6) yielding Giga- 
tons of carbon emission close to zero in the year of 2100 and the CO, concentration 
in the atmosphere is about 400 ppm then the temperature raise is expected to range 
from 0.3 to 1.7°C. If the CHG emission is high following RCP 8.5 then the tem- 
perature rise may range from about 2.6 to 4.8°C which may result in catastrophic 
effects. 

The energy production using renewable and sustainable sources are gaining pop- 
ularity and becoming wide spread in the past decade. The renewable sources include 
hydro, solar, wind and tidal. The nuclear energy may be considered as a renewable 
since the uranium deposits could provide energy for billion years (Cohen 1983) and 
there is no GHG emissions (Vasques 2014; Moore 2006). However, the safety issues 
and the concept of clean energy may exclude the nuclear energy from the renew- 
ables. Figure | shows that the share of this energy is 5% for the year 2019 whereas the 
renewables share, listed as Others, is only 2.2%. However, this substantial increase 
from 0.1% in 1973 is because of the advent of the renewable technologies in the 
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recent past. The photo voltaic, both rooftop and commercial, systems become pop- 
ular but the capital cost projections in Winskel et al. (2009) (see their Fig. 4.1) does 
not seem to be realistic (the actual cost is nearly twice the projected cost of about 
£1000 per kW for 2019) because the price will increase as the demand grows unless 
the supply is in surplus. 

The levelised cost of electricity for renewable technologies at utility-scale is 
becoming lower than that for the traditional fossil fuels—0.038 to 0.076 USD/kWh 
depending on the renewables compared to 0.05 to 0.18 USD/kWh for fossil fuels 
(IRENA 2020)—which is an excellent progress. However, the consumer energy 
prices do not reflect this lower cost for the renewables yet. Perhaps, this may take 
some more time. Although the renewable power generation has increased by nearly 
50% (a total of about 780 GW) for the year 2020 compared to 2019 (IRENA 2021), 
this is substantially lower than the 2019 projection of 1.5 TW for 2020 (IRENA 2019). 
This clearly suggests that the renewables share is growing slowly and one may have 
to accelerate it but the accelerated growth may have its own consequence on the 
environment for the reasons argued in Lgrstad et al. (2022a), which are based on the 
data for GHG emissions and cradle-to-grave life cycle analysis (LCA) published in 
past studies. For example, the electric vehicles projected to have zero emission is 
not so in reality according to cave-to-case analysis showing that one has to drive a 
110 kW size EV for about 35,000 km without recharging to offset the CO2 emitted 
by the battery pack production alone (Alvarez 2019). This is not practical. It is likely 
that combustion will remain as one of the components in the energy technology 
mix and will play an important part for specific applications, such as transport and 
energy-intensive industries, requiring high energy densities but its form and type are 
likely to be different. 


1 Combustion Technology Role 


The mitigation of global warming requires solutions, targeted towards reducing GHG 
emissions, which arise from efforts concerted across various continents and coun- 
trywide solutions are inadequate. While a complete shift towards renewables seems 
attractive and achievable over longer timescales but the accelerated shift set by vari- 
ous governments independently does not sound pragmatic. Perhaps, this may worsen 
the situation because the additional energy required to achieve the accelerated shift 
towards renewables has to come from non-renewables. Thus, a balanced approach to 
meet the ever increasing energy without aggravating the global warming is needed. 

Combustion technologies play important role in this respect as suggested by the 
results in Fig. 2 showing future projections for the combustion share of world TPES 
under three different scenarios (Swaminathan 2019). The inset is the actual data from 
the International Energy Agency (IEA 2021) showing a gradual decrease in the com- 
bustion share while a small rise in 2012 is because of the increase in coal combustion 
in some countries in that year. If one makes a naive projection by assuming that the 
progress in renewable technologies is steady and organic following the current trends 
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Fig. 2 Combustion share of world TPES and its future projections. Adapted from Swaminathan 
(2019) 


then the combustion share will be more than 75% even by the year 2110 (the solid 
line). The slope of this curve is related to the progress and advancement of alterna- 
tive energy technologies. If one keeps an optimistic view for these technologies and 
presumes that they are progressing at about 50% faster pace compared to the current 
trend then the combustion share falls to about 70% in 2110. This share decreases 
further to 66% for the year 2110 if one assumes that the alternative technologies 
progress at 80% faster pace. To achieve this, a radical paradigm shift is needed and 
whether this is practical or not from the economical consideration is an open ques- 
tion. Even the heavily accelerated shift (80% scenario) reduces the combustion share 
only by 40% and thus a pragmatic approach is to seek for alternative combustion 
concepts and technologies which can significantly reduce GHG emissions and can 
act as retrofits to the existing combustion systems which can also aid a quicker shift 
towards renewables in the longer run. 

Many alternative combustion concepts such as fuel-lean and MILD (moderate, 
intense or low dilution) combustion emerge as potential solutions since they could 
deliver both low emissions and high efficiency. However, using them for practi- 
cal applications bring their own challenges as discussed by Swaminathan and Bray 
(2011) and Lørstad et al. (2022a). Also, carbon-free and E-fuels are emerging as 
potential alternative solutions to mitigate CO emission while catering to the ever- 
increasing energy demand. Specifically, hydrogen combustion seems to be gaining 
momentum with a view to use hydrogen as a main energy carrier. Although this 
solution addresses the CO% emission directly it brings additional challenges for its 
safe usage, controlled combustion for practical applications and potential increase 
in NO, emissions. One of the current NO, reduction technologies can be utilised to 
control this emission from hydrogen or E-fuel combustion. Nevertheless, the distri- 
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bution of hydrogen from production sites to consumers is challenging which requires 
a complete infrastructure overhaul and the scale of economy for this cannot be under- 
estimated adding further challenges. 

Modern computational methods and approaches play significant parts in devel- 
oping these alternative technologies and taking them to fruition. The use of machine 
learning algorithm (MLA) and techniques in computational fluid dynamics (CFD), 
specifically for turbulent flows and turbulent combustion are gaining renewed 
momentum in recent times for two reasons, viz., (i) these algorithms and techniques 
have evolved and developed for a wide-spread use across various disciplines and 
(ii) to take advantage of their robustness, accuracy and computational efficiencies 
so that the CFD codes with MLA can be employed for quick evaluations of design 
changes. Before discussing the role of MLA in computational simulations of turbu- 
lent flows with chemical reactions, let us briefly review the governing principles and 
equations, and various computational methods used for turbulent combustion. The 
topic of turbulent reacting flow simulations has been discussed elaborately in many 
books, see for example Swaminathan and Bray (2011), Libby and Williams (1980), 
Poinsot and Veynante (2005), Echekki and Mastorakos (2011), Swaminathan et al. 
(2022b), only a brief review with detail required to fulfil the aim of this volume is 
discussed in the next section. 


2 Governing Equations 


The computational simulations of turbulent reacting flows use three numerical 
approaches, namely direct numerical simulation (DNS), large eddy simulation 
(LES) and Reynolds-Averaged Navier Stokes calculation (RANS). These approaches 
involve different levels of detail, approximations and modelling. The complete set 
of conservations equations are solved with no models using high order numerical 
schemes in the DNS approach and further detail can be found in many books, for 
example Poinsot and Veynante (2005). This approach resolves and captures the range 
of, from dissipative to energy containing, scales in the flow without using any mod- 
eling approximations and this range increases with turbulence Reynolds number, 
Re,. The ranges of spatial and temporal scales vary as Re;! * and Re, is respectively 
and thus the computational cost for using DNS at Re, relevant for practical appli- 
cation in appropriate geometry is prohibitive. Hence, this approach is typically used 
to gain fundamental understanding of turbulence and its interaction with chemical 
reactions, and these knowledge are important for devising engineering models for 
practical use. There are many examples for this which are discussed and summarised 
in Swaminathan and Bray (2011), Poinsot and Veynante (2005), Echekki and Mas- 
torakos (2011), Swaminathan et al. (2022b). Appropriately averaged conservation 
equations are solved in the RANS approach along with closure models and approx- 
imations, which are discussed elaborately in many past works, for example see the 
books edited by Libby and Williams (1980, 1994) and the works in Swaminathan and 
Bray (2011). The RANS equations are deterministic and do not have the stochastic 
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aspects required for statistical inference and hence one must be cautious in using 
MLA for RANS calculations. However, it is possible to use some of the machine 
learning algorithms to address the uncertainties of RANS model parameters. LES 
approach is well suited to make use of MLA since there is inherent stochasticity. 
Before identifying the potential avenues to use MLA for LES, let us briefly review 
the required governing equations. 


3 Equations for LES 


In large eddy simulations, the low-pass filtered governing equations for mass, 
momentum, energy and species mass fractions are solved. The filtering, or sepa- 
ration of the scales, is done with a spatial filter, which is applied to the governing 
equations for the above quantities. The various filters and their attributes are discussed 
in many text books, for example see Pope (2000) and Favre-filtering, also known 
as density-weighted filtering, is commonly used for flows such as turbulent com- 
bustion involving strong density variations. The filtering implies that the dynamic 
large scales, which are larger than the filter cut-off scale, are resolved and the scales 
smaller than the cut-off scale, known as subgrid scales (SGS), are modelled. Hence, 
the computational cost for LES is much lower than that for DNS because coarser 
grids and larger time steps can be used for similar level of numerical fidelity. 
The Favre-filtered governing equations are written as 


0p i 
Mass: 2 +V. (u =0 (1) 
Ip ù ere s 
Momentum: ar V pumpe VT =y (2) 
— Ns 
Energy: —— +V. ( ij- oe p Y;U;hi 
* Dt = L tre 


+7: Vu+ Q, +ma-Yy-0 (3) 


apy; ay ; 
A +V- (Paf) =V- (-pY0:) +0- V- y; (4) 


Species: 


using standard notations and U; is the diffusion velocity of species i. 
The filtering procedure yields extra terms, SGS stress tensor T°, SGS enthalpy 


flux a, SGS pressure-dilation I1g;;, and SGS species flux T, given by 
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TS =p (wu — Ti) (5) 
@ =p(uh—@h) (6) 
Mg =u-Vp—u- Vp (7) 
F; =p (uy, -a7,) (8) 


These unknown quantities represent the influence of unresolved scales on the resolved 
scales and need closure models. The pressure-dilation in Eq. (7) is sometimes less 
important in compressible flows and therefore commonly neglected (Piomelli 1999; 
Martin et al. 2000). A plausible modelling for it and its limitation are explored 
in Langella et al. (2017). Closure models are required for all the SGS quantities in 
Eqs. (6) to (8), molecular diffusion related quantities in Eqs. (2) to (4) and the species 
reaction rate @;. The molecular diffusion of momentum (viscous shear, T), energy 
(heat flux, q), and species (diffusive flux, — Y;U;) are modeled following classical 
ideas of gradient diffusion after neglecting the fluctuations in viscosity, diffusivity 
and heat conductivity (Piomelli 1999; Gicquel 2012). Further detail on these models 
and the LES governing equations are discussed in Pope (2000), Poinsot and Veynante 
(2005) and Garnier et al. (2009). 


3.1 SGS Closures 


A few common closures for the SGS terms given in Eqs. (6) to (8) are discussed 
briefly here. The eddy viscosity models are the most simple ones for the SGS stress 
in Eq. (5) and the popular of these is the classical Smagorinsky model (Smagorinsky 
1963), which has been extended to the SGS kinetic energy by Yoshizawa (1986). 
The Smagorinsky model, in tensor notation, is 


ôi; a fe One 
tS — 3a = —2 C? A’p|§| g = tisa) 


ij 
= xo Sys 

= —2P vscs | Sij — = Sak , and (9) 

tp, = 2 Crp A? S)? (10) 


where Sij = 0.5 (au; /dx; + du; /x;) is the resolved symmetric strain-rate tensor 


and ISI = 42 S; j S; j- The filter width estimated typically using the local numerical 
cell volume is denoted as A. Equation (9) defines the SGS eddy viscosity, vsgs, 
and the symbols C, and C; are model constants. The os which is twice the SGS 
kinetic energy, is likely to be small or negligible in low Mach number flows as noted 
by Martin et al. (2000) but may not be so for flows with strong heat release. 
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The Smagorinsky models is relatively simple and robust, but it has its limitation 
for near-wall and transition flows since it can give a non-vanishing eddy viscosity, 
which is unphysical and this can be remedied by invoking damping functions, but 
an alternative approach is to use a dynamical procedure to determine C, and Cz as 
proposed in Moin et al. (1991). This approach is used widely by applying a second 
filter of typical width A = 2A to the resolved fields to compute the resolved stress 
near the filter cut-off. Assuming similarity of the stresses near the cut-off scale, A, 
this resolved stress can be used to find an expression for C, and Cz in terms of the 
resolved velocity gradients, see Pope (2000), Martin et al. (2000) and Garnier et al. 
(2009). 

The dynamic procedure allows the model to adapt itself to the local flow changes 
and hence vsgs naturally approaches zero near solid walls and in laminar regions 
which retains physical behaviour. The dynamic procedure can produce vsgs < 0 
implying an instantaneous reverse cascade of kinetic energy locally which may occur 
in turbulent flows. However, this can lead to numerical instabilities and therefore, 
it is common to clip C, to avoid negative vsgs or by averaging it in either space or 
time. 

Other algebraic approaches have also been developed in past studies to over this 
specific issue of vsgs not approaching zero near a wall in wall bounded flows. Details 
can be found in Vreman (2004), Nicoud and Ducros (1999), Nicoud et al. (2011). An 
alternative approach to estimate vsgs uses the SGS turbulent kinetic energy, Esas, 
obtained directly by using its transport equation, see Yoshizawa and Horiuti (1985) 
and Ghosal et al. (1995). Various approaches have also been proposed, developed and 
tested for the SGS stresses in many past studies and detail can be found in Zang et al. 
(1993), Lesieur and Métais (1996), Layton (1996), Kosovic (1997), Misra and Pullin 
(1997), Meneveau and Katz (1997), Armenio and Piomelli (2000), Domaradzki and 
Adams (2002), Chaouat and Schiestel (2005), Lucor et al. (2007). 

Further to the SGS stress discussed above, the SGS fluxes needing modelling and 
a straightforward approach is to use an eddy diffusivity model written as 


= —pvVv = = —O0v ~ 
so eey, ad Pa OSS (11) 
Scsgs Prsgs 


for species and enthalpy respectively. The symbols Scggs and Prsgs are the SGS 
Schmidt and Prandtl numbers respectively. These quantities may be estimated using 
a static or dynamic procedure, see Martin et al. (2000), Garnier et al. (2009) and 
Moin et al. (1991). Many other models for the SGS stresses and fluxes have been 
developed and tested in past studies (Martin et al. 2000; Garnier et al. 2009; Silvis 
et al. 2017) and these models are introduced and discussed in later chapters, specifi- 
cally in chapter “Machine-Learning for Stress Tensor Modelling in Large Eddy Sim- 
ulation”. The statistics obtained using these models could show some sensitivities to 
errors introduced by the numerical scheme, especially for second order statistics and 
thus some care is needed. Perhaps, one way to address these issues is to use MLA 
to estimate the model parameters, which is discussed in chapter “Machine-Learning 
for Stress Tensor Modelling in Large Eddy Simulation”. 
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The chemical reaction rate in the species equation, Eq. (4), is important for tur- 
bulent combustion. The physical processes represented by this term typically occur 
at SGS level. Also, the reaction rate is a highly nonlinear function of temperature, 
T, and species mass fractions, Y;, and, hence it cannot be expressed in a meaningful 
way using only the resolved temperature and species mass fractions. Formulating 
a robust yet accurate SGS closure for the reaction rate is challenging and impor- 
tant and this has been studied in past studies which are reviewed and summarised 
in many references, see for example Swaminathan and Bray (2011), Poinsot and 
Veynante (2005), Swaminathan et al. (2022b), Gicquel et al. (2012), Peters (2000), 
Pitsch (2006), Rutland (2011). Each of these approaches has their advantages and 
limitations in terms their predictive abilities, simplicity, ease of use, computational 
expenses, physical basis and these aspects are discussed in past works, for example 
see Swaminathan et al. (2022b). In the following, we give an brief overview on the 
challenges involved in LES and the role of MLA to tackle them which also helps us 
to articulate the objectives for this volume. 


3.2 LES Challenges and Role of MLA 


The SGS closures are predominantly based on the gradient flux hypothesis as dis- 
cussed in the previous subsection and it is well known that in reacting flows there 
are processes which defy this hypothesis. Hence, modelling counter-gradient sub- 
grid scalar fluxes are still an outstanding issue, specifically for low Reynolds number 
reacting flows. Despite this, LES calculations with the gradient flux models have 
shown good agrements between the computed and measured statistics suggesting 
that these models are sufficient for flows of interest to practical systems. Another 
challenge for LES is on the near-wall flow characteristics. It is quite well known that 
practical LES cannot recover the law of the wall and some special numerical treat- 
ments are required as noted by Nikitin et al. (2000) and Brasseur and Wei (2010). 
Recovering the law of the wall becomes important when the heat and momentum 
fluxes through the walls (of the combustor, for example) need to be evaluated as 
design variables. 

It is observed generally that the numerical grids used for LES of reacting flows 
resolve instantaneous flame structure to some extend, which is acceptable for atmo- 
spheric pressure. High pressure flows in complex geometries are common in practical 
applications and thus resolving the instantaneous flame structure will likely to yield 
impractical grid cell counts because the flame thickness approximately scales as 
5:n ~ p~! (Turns 2006) and some of the important geometry detail need to be cap- 
tured in the grid. Thus, the common practice of using grids having cell sizes of the 
order of 5,, is unattractive for practical LES. Consequently, SGS combustion models 
have to be robust and accurate in representing the relevant physical processes and 
machine learning algorithms can play important role here. Probably, it is useful to 
design or select a grid resolving most of the kinetic energy in the flow and let the SGS 
closures, specifically for combustion, to handle the turbulence-chemistry interactions 
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and their intricacies for LES of reacting flows in practical systems. The guidance 
suggested by Pope (2000), which is K = Kegs [Kees + ksgs) < 0.2, where ksgs and kres 
are subgrid scale and resolved kinetic energies respectively, may be used. It is to be 
noted that this condition can only be evaluated after completing a preliminary LES 
of non-reacting flow in a given geometry. Alternative measures to evaluate LES grid 
requirement have also been suggested in past studies. However, the parameter K is 
quite practical and useful, and thus it is recommended. This requirement is to be 
applied for flows before igniting the flame and thus checking and satisfying this grid 
requirement are quite straightforward since the LES of non-reacting flow is the first 
step in conducting LES of turbulent combustion. 

Machine learning algorithms can play a vital role in turbulent combustion calcu- 
lations. These algorithms can be leveraged to build SGS models which can reduce 
computational requirements substantially. However, using MLA for these purposes 
are not common yet and there is a surge of research activities in this direction. The 
subgrid fluid dynamic and combustion processes and their interactions are highly 
non-linear stochastic events and thus MLA is well suited to infere the SGS statis- 
tics required for LES. Typically, machine learning methods are used for pattern 
recognition in various fields (Hinton et al. 2012; Sathiesh et al. 2016; Gogul and 
Sathiesh Kumar 2017) and are finding their ways into other fields such as climate 
modelling (Watson-Parris 2021), drug discovery (Bhati et al. 2021) and fluid mechan- 
ics (Brunton et al. 2020). Their application to reacting flows is gaining momentum 
although it is still at an early development and validation stage. Hence, the objective 
for compiling this volume is to bring together the latest developments in MLA and its 
application to chemically reacting flows and make it readily accessible for researchers 
and graduate students interested in this multi- and cross-disciplinary topic. 


4 Objectives 


The broad aim here is to bring together the recent developments in the field of 
MLA applied to reacting flow calculations. These flows in practical systems are 
invariably turbulent and hence there are three important aspects, viz., turbulence, 
chemical reactions and their interactions, requiring close attention. The chemical 
reactions are because of molecular collisions but, at continuum level of description 
used commonly for turbulent reacting flow simulations, they are modelled using 
Arrhenius rate expressions involving kinetic parameters. These parameters, related 
to the atomic potential energies, are obtained typically using shock tube experi- 
ments but recent advances in ML techniques is helping to estimate these parameters 
using atomistic molecular dynamic simulations as described in chapter “Machine 
Learning Techniques in Reactive Atomistic Simulations”. This chapter also gives 
an overview of various ML algorithms. One needs large data sets to train and 
validate these algorithm before using them for inferring quantities of interest and 
thus their robustness depends on the conditions covered in the data sets and hence 
these data sets can be huge. Hence one needs a clever and intelligent algorithm to 
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detect events/patterns of interest in the data. Machine learning algorithms can come 
handy for this purpose as discussed in chapter “A Novel In Situ Machine Learning 
Framework for Intelligent Data Capture and Event Detection” suggesting an inter- 
esting idea—in situ training—to train MLA. The application of MLA to infer SGS 
stresses and fluxes are described in chapter “Machine-Learning for Stress Tensor 
Modelling in Large Eddy Simulation”. The combustion chemistry is quite complex 
even for a simple fuel like methane or hydrogen and involves a large number of 
elementary reactions with disparate time and length scales. Hence integrating these 
reaction into numerical simulations of turbulent combustion can make the simula- 
tions to be prohibitively expensive. Machine learning can be leveraged to accelerate 
chemistry integration by helping us to understand combustion chemistry closely 
as described in chapter “Machine Learning for Combustion Chemistry”. The third 
aspect, turbulence-chemistry interaction, of turbulent combustion noted above can 
be addressed using different modelling approaches which helps us to estimate the 
filtered reaction rate of a chemical species or a reaction progress variable depending 
on the modelling approach used. The application of machine learning algorithms to 
these approaches are discussed in chapters “Deep Convolutional Neural Networks 
for Subgrid-Scale Flame Wrinkling Modeling” to “AI Super-Resolution: Applica- 
tion to Turbulence and Combustion”. Obeying constraints coming from physical 
conservation laws and requirements (for example species mass fractions have to 
positive or zero) can become an issue for machine learning methods and some extra 
care is required while defining the cost function needed in the training step for 
machine learning algorithms, see chapters “Machine Learning Techniques in Reac- 
tive Atomistic Simulations” and “AI Super-Resolution: Application to Turbulence 
and Combustion”. The interaction between fluctuating heat release rate and pressure 
in turbulent combustion established inside a tube as in many practical combustion 
systems, for example gas turbines and rocket engines, will have thermoacoustic 
oscillations which can become an issue for safe operation of these systems if these 
oscillations are not controlled. Predicting these oscillations and their on-set are chal- 
lenging machine learning algorithms can be applied to these problems as described 
in chapter “Machine Learning for Thermoacoustics”. The concluding remarks are 
drawn in the final chapter. 
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Machine Learning Techniques A) 
in Reactive Atomistic Simulations Oreck for 


H. Aktulga, V. Ravindra, A. Grama, and S. Pandit 


Abstract This chapter describes recent advances in the use of machine learning 
techniques in reactive atomistic simulations. In particular, it provides an overview 
of techniques used in training force fields with closed form potentials, develop- 
ing machine-learning-based potentials, use of machine learning in accelerating the 
simulation process, and analytics techniques for drawing insights from simulation 
results. The chapter covers basic machine learning techniques, training procedures 
and loss functions, issues of off-line and in-lined training, and associated numerical 
and algorithmic issues. The chapter highlights key outstanding challenges, promising 
approaches, and potential future developments. While the chapter relies on reactive 
atomistic simulations to motivate models and methods, these are more generally 
applicable to other modeling paradigms for reactive flows. 


1 Introduction and Overview 


Time-dependent reactive simulations involve complex interaction models that must 
be trained using experimental or highly resolved simulation data. The training pro- 
cess as well as data acquisition are often computationally expensive. Once trained, 
the coupling models are incorporated into reactive simulation procedures that involve 
small time-steps, and generate large amounts of data that must be effectively ana- 
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lyzed for drawing scientific insights. The past few decades have witnessed signifi- 
cant advances in each of these facets. More recently, increasing attention has been 
focused on the development and application of machine learning (ML) techniques 
for increasing the accuracy, generalizability, and speed of such simulations. 

In this chapter, we provide an overview of ML models and methods, along with 
their use in reactive particle simulations. We use highly resolved reactive atomistic 
simulations as the model problem for motivating and describing ML methods. We 
start by first presenting an overview of common ML techniques that are broadly 
used in the field. We then present the use of these techniques in training interaction 
models for reactive atomistic simulations. Recent work has focused on overcoming 
the time-step constraints of conventional reactive atomistic methods—we describe 
these methods and survey key results in the area. Finally we discuss the use of ML 
techniques in analyzing atomistic trajectories. The goal of the Chapter is to provide 
readers with a broad understanding of the state of the art in the area, unresolved chal- 
lenges, and available methods and software for constructing simulations in diverse 
application domains. While we use reactive atomistics as our model problem, the 
discussion is broadly applicable to other particle-based/ discrete element simulation 
paradigms. 

Reactive atomistic simulations provide understanding of chemical processes at the 
atomic level, which are usually not accessible through common experimental tech- 
niques. Quantum chemistry methods have come a long way in modeling electronic 
structures and subsequent chemical changes at the scale of a few atoms. However, if 
the interest is in the thermodynamics of chemical reactions then atomistic techniques 
are the methods of choice. Here, individual reactions are modeled in an approximate 
sense but system size (or particle number) approaches thermodynamic limit (or a suit- 
able approximation thereof, i.e., as large as practical). One of the simplest sampling 
techniques used in atomistic simulations is molecular dynamics, which provides a 
psuedo-Newtonian trajectory of the system, and is applicable in modeling equilib- 
rium as well as non-equilibrium problems. There are other sampling techniques such 
as Monte Carlo methods which are exclusively applicable to equilibrium statisti- 
cal mechanical models. In this Chapter, we primarily focus on reactive molecular 
dynamics techniques. 


1.1 Molecular Dynamics, Reactive Force Fields 
and the Concept of Bond Order 


Molecular Dynamics (MD) is a widely adopted method for studying diverse molec- 
ular systems at an atomistic level, ranging from biophysics to chemistry and material 
science. While quantum mechanical (QM) models provide highly accurate results, 
they are of limited applicability in terms of spatial and temporal scales. MD simula- 
tions rely on parameterized force fields that enable the study of larger systems (with 
millions to billions of degrees of freedom) using atomistic models that are compu- 
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Fig. 1 Various classical force field interactions employed in atomistic MD simulations 


tationally tractable and scalable on large computer systems. Typical applications of 
MD range from computational drug discovery to design of new materials. 

MD is an active field in terms of the development of new techniques. In its most 
conventional form (i.e., classical MD), it relies on “Born-Oppenheimer approxima- 
tion”, where atomic nuclei and the core electrons together are treated as classical 
point particles and the interactions of outer electrons are approximated by pairwise 
and “many-body” terms such as bond, angle, torsion and non-bonded interactions, 
and additionally by using variable charge models. Each interaction is described by 
a parametric mathematical formula to compute relevant energies and forces. The 
collection of various interactions used to describe a molecular system is called a 
force field. Figure | illustrates interactions commonly used in various force fields. 
Equation 1 gives an example of a simple force field where Ky, ro, Ka, 90, Va, Qo, Eij, V 
and o;; denote parameters that are specific to the types of interacting atoms (which 
may be a pair, triplet, or quadruplet of atoms), and € denotes some global parameter. 


V, 
Va = J KiC =r)? + J Kal — G0)? + YY FU + cos(vd — go) 


bonds angles torsions 
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Classical MD models, as implemented in highly popular MD software such as 
Amber (Case et al. 2021), LAMMPS (Thompson et al. 2022), GROMACS (Hess et 
al. 2008) and NAMD (Phillips et al. 2005), are based on the assumption of static 
chemical bonds and, in general, static charges. Therefore, they are not applicable to 
modeling phenomena where chemical reactions and charge polarization effects play 
a significant role. To address this gap, reactive force fields (e.g., ReaxFF, Senftle 
et al. (2016), REBO, Stuart et al. (2000), Tersoff (1989)) have been developed. 
Functional forms for reactive potentials are significantly more complex than their 
non reactive counterparts due to the presence of dynamic bonds and charges. The 
development of an accurate force field (be it non-reactive or reactive) is a tedious task 
that relies heavily on biological and/or chemical intuition. More recently, machine 
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learning based potentials have been proposed to alleviate the burden of force field 
design and fitting. Even so, the most computationally efficient way to study a large 
reactive molecular system, as would be necessary in a reactive flow application, 
is a well tuned reactive force field model. Hence, this Chapter focuses on reactive 
force fields and specifically on ReaxFF whenever it is necessary to discuss specific 
methods and results, since covering all reactive force field models would necessitate 
a significantly longer discussion. Nevertheless, models and methods discussed for 
ReaxFF are broadly applicable to other reactive force fields, as well. 

Bond order is a key concept in reactive simulations; it models the overlap of 
electronic orbitals. This is intrinsically ambiguous in classical simulations because 
of approximations in assigning bond index and the bond type based on the wave 
function overlaps (Dick and Freund 1983). In classical reactive simulations, bond 
order is defined as a smooth function that vanishes with increasing distance between 
the atoms (van Duin et al. 2001). Clearly, such a function must depend on the envi- 
ronment of the atoms to correctly reproduce valencies. In non-reactive classical 
simulations, bond structure is maintained by either applying constraints on where a 
bond is expected to exist, or by assigning a large energy penalty (typically in the form 
of a harmonic potential, see e.g. Eq. (1)) if the atoms deviate from the expected bond 
length (Frenkel and Smit 2002). In either case, an improperly optimized force field 
can lead to divergent energies or break-down of the constraint algorithms. Reactive 
systems, however, have bond orders that smoothly go to zero, and usually do not 
have this problem but may end up with an un-physical final structure. Recently pro- 
posed ML-based approaches depend only on the atomic positions and sometimes on 
momenta, but do not carry information on molecular topology. Consequently, such 
approaches are well-suited for describing reactive simulations. 


1.2 Accuracy, Complexity, and Transferability 


Three key aspects must be considered when formulating simulation models: (i) Accu- 
racy: A simulation is expected to reproduce structure as well as the chemical reac- 
tions and reaction rates for the model system against the target data. If a model has 
a sufficient number of free parameters, then, in principle, such model can accurately 
describe the physical system. However, the choice of model and its size depend on 
the availability of target training data, which are usually highly-resolved quantum 
chemistry calculations ranging from Density Functional Theory (DFT) to coupled 
cluster theory, along with a basis sets specifying the desired level of accuracy; (ii) 
Complexity: For any simulation model the complexity increases with the number 
of terms and free parameters in force computations (Frenkel and Smit 2002). Thus, 
accuracy of the model goes hand in hand with its complexity. Ideally, we would 
like to have a high accuracy and low complexity model. Consequently, a clever 
use of target data for extracting accurate results from a relatively simple model or 
alternately, approximations that represent minimal compromise on accuracy for sig- 
nificant reduction in model complexity are desirable; and (iii) Transferability: The 
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models are expected to provide physical insight into the system by reproducing 
correct properties for different types of systems beyond the training data. This is 
usually achieved by breaking down the interaction terms into corresponding physi- 
cal concepts, e.g., bond interaction, angle interaction, shielded 1—4 interaction, etc. 
Each of these interactions, although suitably abstracted, represent a physical con- 
cept that is expected to have similar interaction behavior under different conditions. 
Thus the total interaction can be computed as a combination of such transferable 
terms (Frenkel and Smit 2002). We note that the target data (usually obtained using 
quantum calculations) are not split into such physical abstractions. This gives rise to 
numerous models with similar accuracy and varying degrees of transferability. Com- 
monly used reactive potentials such as REBO or ReaxFF are built with tranferability 
as a key consideration. However, even within the limited domain of atomic types and 
environments, these simulations rarely produce accurate results for wide variety of 
problems without requiring a re-tuning of the force field parameters. Unlike fixed 
form potential simulations, machine learnt potentials focus on tranferability of the 
model to similar atomic enviroments as the training datasets and optimize for higher 
accuracy as well as lower complexity. 

In the rest of this chapter, we describe how reactive interaction models are con- 
structed, trained, and used in accelerating simulations, in particular by making use 
of ML-based techniques. We begin our discussion with an overview of common ML 
models and methods, followed by their use in the simulation toolchain. 


2 Machine Learning and Optimization Techniques 


We begin our discussion with an overview of general ML techniques. This literature 
is vast and rapidly evolving. For this reason, we restrict ourselves to common ML 
techniques as they apply to reactive particle-based simulations. 

ML frameworks are typically comprised of a model, a suitably specified cost 
function, and a training set over which the cost function is minimized. An ML model 
corresponds to an abstraction of the physical system—e.g., the force on an atom in its 
atomic context, and has a number of parameters that must be suitably instantiated. 
The cost function corresponds to the mismatch between the output of the model 
and physical (experimental or high-resolution simulated) data. Minimizing the cost 
function yields the necessary parametrization of the model. Training data is used to 
match the model output with target distribution. At the heart of ML procedures is the 
optimization technique used to match the model output with the target distribution. 

The cost-function in typical ML applications is averaged over the training set: 


JO) = Eg ypu, LIFE; 0), y] (2) 


Here, J(.) represents the cost-function, P4ata represents the empirical distribution 
(i.e., the training set), L(.) is the loss-function that quantifies the difference between 
estimated and true value, and f(.) is a prediction function parameterized by 0. 
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A key point to note here is that we operate on empirical data, and not the “true” 
data distribution. Hence, this approach is also called empirical risk minimization 
(Vapnik 1991). The assumption is that minimizing the loss w.r.t. empirical data will 
(indirectly) minimize the loss w.r.t. true data distribution, thereby allowing for gen- 
eralizability (i.e., to make predictions on unseen data samples). In the rest of this 
section, we discuss continuous and discrete optimization strategies commonly used 
in ML formulations. 


2.1 Continuous Optimization for Convex and Non-convex 
Optimization 


In many applications, the objective function in Eq. 2 is continuous and differentiable. 
For such applications, a key consideration is whether the function is convex or non- 
convex (recall that a real-valued convex function is one in which the line joining any 
two points on the graph of the function does not lie below the graph at any point 
in the interval between the two points). Simple approaches to optimizing convex 
functions start from an initial guess, compute the gradient, and take a step along 
the gradient. This process is repeated until the gradient is sufficiently small (i.e., the 
function is close to its minima). In ML applications, the step size is determined by 
the gradient and the learning rate—the smaller the gradient, the lower the step size. 
Convex objective functions arise in models such as logistic regression and single 
layer neural networks. 

In more general ML models such as deep neural networks, the objective function 
(Eq. 2) is not convex. Optimizing non-convex objective functions in high dimen- 
sions is a computationally hard problem. For this reason, most current optimizers 
use a gradient descent approach (or its variant) to find a local minima in the objective 
function space. It is important to note that a point of zero gradient may be a local min- 
ima or a saddle point. Common solvers rely on randomization and noise introduced 
by sampling to escape saddle points. In deep learning applications, the problem 
of computing the gradient can be elegantly cast as a backpropagation operator— 
making it computationally simple and inexpensive. Optimization methods that use 
the entire training set to compute the gradient are called batch or deterministic meth- 
ods (Rumelhart et al. 1986). Methods that operate on small-subsets of the dataset 
(called minibatches) are called stochastic methods. In this context, a complete pass 
over the training dataset sampled in minibatches is called an epoch. Stochastic Gradi- 
ent Descent (SGD) methods are workhorses for training deep neural network models. 

First order methods such as SGD suffer from slow convergence, lack of robust- 
ness, and need for tuning a large number of hyperparameters. Indeed, model training 
using SGD-type methods incurs most of its computation cost in exploring the high- 
dimensional hyperparameter space to find model parametrizations with high accuracy 
and generalization properties (Goodfellow et al. 2014). These problems have moti- 
vated significant recent research in the development of second order methods and their 
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variants. Second order methods scale different components of the gradient suitably 
to accelerate convergence. They also typically have much fewer hyperparameters, 
making the training process much simpler. However, these methods involve a product 
with the inverse of the dense Hessian matrix, which is computationally expensive. 
Solutions to these problems include statistical sampling, low-rank structures, and 
Kronecker products as approximations for the Hessian. 


2.2 Discrete Optimization 


In contrast to continuous optimization, in many applications, the variables and the 
objective function take discrete values, and thus the derivative of the objective func- 
tion may not exist. This is often the case when optimizing parameters for force 
fields in atomistic models. Two major classes of techniques for discrete optimization 
are Integer Programming and Combinatorial Optimization. In Integer Programming, 
some (or all) variables are restricted to the space of integers, and the goal is to mini- 
mize an objective subject to specified constraints. In combinatorial optimization, the 
goal is to find the optimal object from a set of feasible discrete objects. Combinatorial 
optimization functions operate on discrete structures such as graphs and trees. The 
class of discrete optimization problems is typically computationally hard. 

A commonly used discrete optimization procedure in optimization of force fields 
is genetic programming (Katoch et al. 2021; Mirjalili 2019). Genetic programming 
starts with a population of potentially suboptimal candidate solutions. It successively 
selects from this population (formally called selection) and combines them (formally 
called crossover) to generate new candidates. In many variants, mutations are intro- 
duced into the candidates to generate new candidates as well. A fitness function is 
used to screen these new candidates and the fittest candidates are retained in the 
population. This process is repeated until the best candidates achieve desired fit- 
ness. In the context of force-field optimization, the process is initialized with a set 
of parametrizations. The fitness function corresponds to the accuracy with which 
the candidate reproduces training data. The crossover function generates new candi- 
dates through operations such as exchange of corresponding parameters, min, max, 
average, and other simple operators. 


3 Machine Learning Models 


While the field of ML is vast, it is common to classify ML algorithms into “super- 
vised” and “unsupervised”. In supervised learning algorithms, training data contain 
both features and labels. The goal is to learn a function that takes as input a feature 
vector and returns a predicted label. Supervised learning can further be categorized 
into classification and regression. When labels are categorical, the learning task is 
commonly called “classification”. On the other hand, if the task is to predict a con- 
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tinuous numerical value, it is called regression. In unsupervised learning algorithms, 
training data do not have labels. The goal of unsupervised algorithms is to analyze 
patterns in data without requiring annotation. Common examples of unsupervised 
algorithms include clustering and dimensionality reduction. We note that there are 
many other active areas of ML, such as reinforcement learning and semi-supervised 
learning that are beyond the scope of this chapter. We refer interested readers to 
more exhaustive sources for a comprehensive discussion (Bishop and Nasrabadi 
2006; Murphy 2012; Shalev-Shwartz and Ben-David 2014; Goodfellow et al. 2016). 


3.1 Unsupervised Learning 


The most commonly used unsupervised learning techniques are clustering and dimen- 
sionality reduction. 


3.1.1 Clustering 


In clustering, data represented as vectors are grouped together on the basis of some 
inherent structures (or patterns), typically characterized by their similarities or dis- 
tances (Saxena et al. 2017; Gan et al. 2020). Clustering algorithms can be categorized 
on the basis of their outputs into: (i) crisp versus overlapping; or (ii) hard versus soft. 
In crisp clustering, each data point is assigned to exactly one cluster, whereas over- 
lapping clustering algorithms allow for multiple memberships for each data point. 
In hard clustering algorithms, a data-point is assigned a 0/1 membership to every 
cluster (a 1 corresponding to the cluster the point is assigned to). In soft clustering 
algorithms, each data point is assigned membership grades (typically in a 0-1 range) 
that indicate the degree to which data points belong to each cluster. If the grades are 
convex (i.e., they are positive and sum to 1), then the grades can be interpreted as 
probabilities with which a data point belongs to each of the classes. In the general 
class of fuzzy clustering algorithms (Ruspini 1969), the convexity condition is not 
required. 

Centroid-based clustering refers to algorithms where each cluster is represented by 
a single, “central” point, which may not be a part of the dataset. The most commonly 
used algorithm for centroid-based clustering (and indeed all of clustering) is k-means 
algorithm of Lloyd (1982). Given a set of data-points [x), Xo,...,X,], and pre- 
defined number of clusters k, the objective function of k-means is given by: 


k 
argmin =>)? Ix- will (3) 
C i=1 xe€C; 


where, C is the union of non-overlapping clusters (C = {C,, C2, ..., Cx}), and u; 
represents the mean of all data-points of belonging to cluster i. Stated otherwise, the 
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objective of k-means clustering is to minimize the distance between data-points and 
their assigned clusters (as represented by the mean). The problem of k-means is NP 
hard, but approximation algorithms such as Lloyd’s Algorithm can efficiently find 
local optima. 

Distribution-based clustering algorithms work on the assumption that data-points 
belonging to the same cluster are drawn from the same distribution. Common algo- 
rithms in this class assume that data follow Gaussian Mixture Models, and typically 
solve the problem using the Expectation-Maximization (EM) Approach. EM does 
maximum likelihood estimation in the presence of latent variables. In each iteration, 
there are two steps. In the first step, latent variables are estimated (E-step). This is 
followed by the Maximization (M-step) where parameters of the models are opti- 
mized to better fit the data. In fact, the aforementioned Lloyd’s algorithm for k-means 
clustering is a simple instance of EM. 

Density-based clustering is a class of spatial-clustering algorithms, in which a 
cluster is modeled as a dense region in data space that is spatially separated from 
other clusters. Density-based spatial clustering of applications with noise (DBSCAN) 
by Ester et al. (1996) is the most commonly used algorithm in this class. DBSCAN 
requires two parameters: (i) €-size of neighborhood; and (ii) Minpts—minimum 
number of points in each cluster. DBSCAN proceeds as follows—first, it finds all 
points that are in the €-neighborhood of all points. Then, it designates points with 
more than Minpts neighbors as “core-points”. Next, it finds connected components 
of core-points by inspecting the neighbors of each core-point. Finally, each non-core- 
point is assigned to the cluster if it is in an € neighborhood. If a data-point is not in 
the neighborhood, it is identified as an outlier, or noise (Schubert et al. 2017). 

Hierarchical clustering refers to a family of clustering algorithms that seeks to 
build a hierarchy of the clusters (Maimon and Rokach 2005). The two common 
approaches to build these hierarchies are bottom-up and top-down. In bottom-up 
(or agglomerative) clustering, each data-point initially belongs to a separate cluster. 
Small clusters are created on the basis of similarity (or proximity). These clusters 
are merged repeatedly until all data-points belong to a single cluster. The reverse 
process is performed in the top-down (or divisive) clustering approaches, where a 
single cluster is split repeatedly until each data-point is its own cluster. The main 
parameters to choose are the metric (i.e., the distance measures), and the linkage 
criterion. Commonly used metrics are L-1, L-2 norms, Hamming distance, and inner 
products. Linkage criterion quantifies distance between two clusters on the basis of 
distances between pairs of points across the clusters. 


3.1.2 Dimensionality Reduction 


Dimensionality reduction is an unsupervised technique common to many applica- 
tions. Reducing dimensions produces a parsimonious denoised representation of data 
that is amenable to analysis by complex algorithms that would otherwise not be able 
to handle large amounts of raw data. 
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Linear Dimensionality Reduction Techniques 

Principal component analysis (PCA) is perhaps the most commonly used linear 
dimension reduction technique. Principal components correspond to directions of 
maximum variation in data. Projecting data on to these directions, consequently, 
maintains dominant patterns in data. The first step in PCA is to center the data around 
zero mean to ensure translational invariance. This is done by computing the mean of 
the rows of data matrix M and subtracting it from each row to give a zero-centered data 
matrix M’. A covariance matrix is then computed as the nromalized form of M’7 M’. 
Note that the (i, j )th element of this covariance matrix is simply the covariance of the 
ith and jth rows of matrix M’. The dominant directions in this covariance matrix are 
then computed as the dominant eigenvectors of this matrix. Selecting the k dominant 
eivengectors and projecting the data matrix M to this subspace yields a k dimensional 
data matrix that best preserves variances in data. A common approach to selecting k 
is to consider the drop in magnitude of corresponding eigenvalues. PCA has several 
advantages: (i) by reducing the effective dimensionality of data, it reduces the cost of 
downstream processing; (ii) by retaining only the dominant directions of variance, 
it denoises the data; and (iii) it provides theroetical bounds on loss of accuracy in 
terms of the dropped eigenvalues. 

The general class of dimensionality reduction techniques also includes other 
matrix decomposition techniques. In general, these techniques express matrix data M 
as an approximate product of two matrices UV"; i.e., they minimize ||M — UV" ||. 
Various methods impose different constraints on matrices U and V, leading to a 
general class of methods that range from dimension reduction to commonly used 
clustering techniques. Perhaps, the best-known technique in this class is Singular 
Value Decomposition (SVD) (Golub and Reinsch 1971), which is closely related to 
PCA, where columns of U and V are orthogonal, and rank-k for some value of k. The 
orthogonality of the column space of these matrices makes them hard to interpret 
directly in the data space. 

In contrast to SVD, if matrix U is constrained to only positive entries and columns 
in matrix U sum to 1, we get a decomposition called archetypal analysis. In this inter- 
pretation, columns of V correspond to the corners of a convex hull of the points in 
matrix M, also known as pure-samples or archetypes, and all data points are expressed 
as convex combinations of these archetypes. A major advantage of archetypal anal- 
yses is that archetypes are directly interpretable in the data space. Another closely 
related decomposition is non-negative matrix factorization (NMF), which relaxes 
the orthogonality constraint of SVD, instead, constraining elements of matrix U to 
be non-negative (Gillis 2020). In doing so, it loses error norm minimization proper- 
ties of SVD, but gains interpretability. All of these methods can be used to identify 
patterns of coherent behavior among particles in the simulation. We refer interested 
readers to a comprehensive survey on linear dimensionality reduction methods by 
Cunningham and Ghahramani (Cunningham and Ghahramani 2015). 


Non-linear Dimensionality Reduction 
General non-linear dimensionality reduction techniques are needed for data that 
resides on complex non-linear manifolds. This is commonly the case for particle 
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datasets in reactive environments. Non-linear dimensionality reduction technqiues 
typically operate in three steps: (i) embedding of data onto a low-dimensional mani- 
fold (in a high-dimensional space); (ii) defining suitable distance measures; and (iii) 
reducing dimensionality to preserve distance measures. Among the more common 
non-linear dimensionality reduction technique is Isometric feature mapping. This 
technique first constructs a graph corresponding to the dataset by associating a node 
with each row of the data matrix, and edges to correspond to the k nearest neighbors 
of the node. This graph is then used to define distances between nodes in terms of 
shortest paths. Finally, techniques such as multidimensional scaling (MDS)—a gen- 
eralization of PCA that can use general distance matrices, as opposed to covariance 
matrices used by PCA, are used to compute low-dimension representations of the 
matrix. An alternate approach uses the spectrum of a Laplace operator defined on 
the manifold to embed data points in a lower dimensional space. Such techniques 
fall into the general class of Laplacian eigenmaps. 

An alternate approach to non-linear dimensionality reduction is the use of non- 
linear transformations on data in conjunction with a suitable distance measure, fol- 
lowed by MDS for dimensionality reduction. The first two steps of this process 
(non-linear transformation and distance measure computation) are often integrated 
into a single step through the specification of a kernel. The use of such a kernel with 
MDS is called kernel PCA. The key challenges in the use of these methods relate 
to: (i) suitable representation techniques (described in Sect. 5); (ii) kernel functions; 
and (iii) appropriate scaling mechanisms since distance matrices can have highly 
skewed distributions and the directions may be dominated by a small number of 
very large entries in the distance matrix. Common approaches to kernel selection 
rely on polynomial transformations of increasing degree until suitable spectral gap is 
observed. Data representations and normalization are highly application and context 
dependent. 


Autoencoder and Deep Dimensionality Reduction 

Autoencoders have been recently proposed for use in non-linear dimensionality 
reduction (Kramer 1991; Schmidhuber 2015; Goodfellow et al. 2016). Autoen- 
coders are feed-forward neural networks (discussed in further detail in Sect. 3.2) 
that are trained to code the identity function—i.e., the output of the autoencoder 
neural network is the input itself. Dimensionality reduction is accomplished in this 
framework by having an intermediate layer with a small number of activation func- 
tions. Through this constraint, an autoencoder is trained to “encode” input data into a 
low-dimensional latent space, with the goal of “decoding” the input back. The output 
of the encoder therefore represents a non-linear reduced dimension representation 
of the input. 


T-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold 
Approximation and Projection (UMAP) 

t-SNE (Maaten and Hinton 2008) and UMAP (McInnes et al. 2018) are commonly 
used non-linear dimensionality reduction techniques for mapping data to two or 
three dimensions—primarily for visual analysis. t-SNE computes two probability 
distributions—one in the high-dimensional space and one in the low-dimensional 
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space. These distributions are constructed so that two points that are close to each 
other in the euclidean space have similar probability values. In the high-dimensional 
space, a Gaussian distribution is centered at each data point, and a conditional prob- 
ability is estimated for all other data points. These conditional probabilities are nor- 
malized to generate a global probability distribution over all points. For points in 
the low dimensional space, t-SNE uses a Cauchy distribution to compute the prob- 
ability distribution. The goal of dimensionality reduction translates to minimizing 
the distance (in terms of KL divergence) of these two distributions. This is typically 
done using gradient descent. In contrast to t-SNE, a closely related technique, UMAP 
assumes that the data is uniformly distributed on a locally connected Riemannian 
manifold and that the Riemannian metric is locally constant or approximately locally 
constant (https://umap-learn.readthedocs.io/en/latest/). Both of these techniques are 
extensively used in visualization of high-dimensional data. 


3.2 Supervised Learning 


The goal of supervised methods is to learn a function from input data vectors to 
output classes (labels) using training input-output examples. The function should 
“generalize” to be able to accurately predict labels for unseen inputs. The general 
learning procedure is as follows: first, the data is split into train and test sets. Then, 
the function is learnt by using the input-output training examples. The learnt function 
is applied to the test input to get predicted outputs. If the algorithm performs poorly 
on training examples, we say that the algorithm “underfits” the data. This typically 
occurs when the model is unable to capture the complexity of the data. When learnt 
functions do not perform well (say, low prediction accuracy) on test data, we say 
that the algorithm “overfits” to the train set. Overfitting occurs when the algorithm 
fits to noise, rather than true data patterns. The problem of balancing underfitting 
and overfitting is called the bias-variance tradeoff. Intuitively, we want the model to 
be sophisticated enough to capture complex data patterns, but on the other hand, we 
don’t want to endow it with the ability to capture idiosyncrasies of the train examples. 

The problem of overfitting can be controlled through a number of approaches. 
In cross-validation, the training set is further divided into subsets (or folds). The 
training procedure proceeds to learn the function by leaving out one fold in every 
iteration. The model is validated on the remaining fold. The parameters of the model 
are optimized to ensure high cross-validation accuracy. Regularization is a technique 
in which a penalty term is added to the error function to prevent overfitting. Tikhonov 
regularization is one of the early examples of regularization that is commonly used in 
linear regression. Early stopping is a form of regularization in which the learner uses 
iterative methods like gradient descent. The key idea of early stopping is to perform 
training until the learning algorithm continues to improve performance on exter- 
nal (unseen) data. It is stopped when improvement on training performance comes 
at the expense of test performance. Other approaches to avoid overfitting include 
data augmentation (increasing number of data points for training) and improved fea- 
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ture selection. Underfitting can be avoided by using more complex models (e.g., 
going from a linear to a non-linear model), increasing training time, and reducing 
regularization. 


3.2.1 Overview of Supervised Learning Algorithms 


Supervised learning algorithms are often categorized as generative or discriminative. 
Generative algorithms aim to learn the distribution of each class of data, whereas dis- 
criminative algorithms aim to find boundaries between different classes. Naive Bayes 
Classifier is a generative approach that uses the Bayes Theorem with strong assump- 
tions on independence between the features (Rish 2001). Given a d-dimensional 
data vector x = [x1, X2,...,Xq], naive Bayes models the probability that x belongs 
to class k as follows: 


d 
P(Crlx) x p(Ce) | | P&C (4) 


i=1 


In practice, the parameters for the distributions of features are estimated using 
maximum-likelihood estimations. Despite the strong assumptions made in naive 
Bayes, it works well in many practical settings. Linear Discriminant Analysis (LDA) 
is a binary classification algorithm that models the conditional probability densities 
P(x|C;) as normal distributions with parameters (ug, &), where k = {0, 1} (McLach- 
lan 2005). The simplifying assumption of homoscedasticity (i.e., the covariance 
matrices are the same for both classes) means that the classifier predicts class 1 if: 


1 
x! (u1 — Mo) +X > z2 (Mi — Ho) « (Hi + Ho) (5) 


More complex generative methods include Bayesian Networks and Hidden Markov 
Models. 

k-Nearest Neighbor (k-NN) algorithm is an early, and still widely used discrim- 
inative algorithm used for both classification and regression. In classification, the 
label of a test data sample is obtained by a vote of the labels of its k-nearest neigh- 
bors. In regression, k-NN computes the predicted value of a test sample as a function 
of the corresponding values of its k-nearest neighbors. Logistic regression uses a 
logistic function (logit) to model a binary dependent variable. In the training phase, 
the parameters for the logit function are learnt. Logistic regression is similar to LDA, 
but with fewer assumptions. 

Support Vector Machine (SVM) (Cortes and Vapnik 1995) is a widely used dis- 
criminative model for regression and classification. Given input data [X1, X2, . . . , Xn] 
and corresponding labels y1, y2,..., Yn, where y; € {—1, 1}, Wi € {1,2,...,n}, 
SVM aims to optimize the following objective function: 
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minimize Aliw? + 5 max(0, 1 — y;(w-x — b)) (6) 


i=1 


Here, vector w represents the vector normal to the separating hyperplane and A is 
the weight given to regularization. The max(.) term is called Hinge-loss function, 
which allows SVMs to work with non-linear boundaries. SVMs typically use the 
so-called “kernel trick”. The idea is that implicit high-dimensional representation of 
raw data can let linear learning algorithms learn non-linear boundaries. The kernel 
function itself is a similarity measure. Common kernels include Fisher, Polynomial, 
Radial Basis Function (RBF), Gaussian, and Sigmoid functions. Other examples of 
discriminative methods include decision trees and random forests. 


3.2.2 Neural Networks 


Neural Networks are interconnected groups of units called neurons that are organized 
in layers. The first layer is called the input layer, and is typically the same dimension 
as the input. The final layer is called the output layer. The outputs of neural networks 
could be prediction of class labels, images, text, etc. Each neuron in an intermediate 
layer is given a number of inputs. It computes a non-linear function on a weighted 
sum of its input. The resulting output may be fed into a number of neurons in the 
next layer. The non-linear function associated with a neuron is called an activation 
function. Common examples of activation functions include hyperbolic tan (tanh), 
sigmoid, Rectified Linear Unit (ReLU), and Leaky ReLU, among many others. 

There are two key steps to designing neural networks for specific tasks. The first 
step corresponds to design of the network architecture. This specifies the number of 
layers, connectivity, and types of neurons. The second step parametrizes weights on 
edges of the neural network using a suitable optimization procedure for matching 
the output distribution with the target distribution (as discussed earlier in Sect. 2.1). 

The term deep learning is used to describe a family of machine learning mod- 
els and methods whose architectures use neural networks as core components. The 
word “deep” corresponds to the the fact that learning algorithms typically use neural 
network models with many layers, in contrast to shallow networks which typically 
have one or two intermediate (or hidden) layers (Schmidhuber 2015). 


3.2.3 Convolutional Neural Networks 


Convolutional Neural Networks (CNNs) are neural networks that use convolutions 
to quantify local pattern matches. CNNs are feed-forward networks with one or more 
convolution layers. CNNs are used extensively in the analysis of images, and more 
recently, graphs that model connected structures such as molecules. CNNs have an 
input layer, hidden layer(s), and output layers. The input to a CNN is a tensor of the 
form #inputs x input height x input width x input channels. The height and 
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width parameters correspond to the size of the original images. The number of input 
channels is typically three (red, green, and blue) for images. 

Each of the hidden layers can be one of: (i) convolutional layer, (ii) pooling layer, 
or a (iii) fully connected layer. A convolutional layer takes as input an image, or 
the output of another layer, and outputs a feature map. This produces a tensor of 
the form #inputs x feature height x feature width x feature channels. Each 
neuron of a CNN processes only a small region of the input. This region is called 
the receptive field. It convolves this input and passes it on to the next layer. Pooling 
Layers are used to reduce the dimensionality of the data. They do so by aggregating 
the outputs of neurons in the previous (convolutional) layer. Pooling strategies can 
be local (operating on a small subset of neurons), or global (operating on the entire 
feature map). Common pooling functions include max and average. In fully connected 
layers, outputs of neurons are connected to every single neuron in the next layer. They 
are often used as the penultimate layer before the output layer, where all weights are 
combined to compute the prediction (i.e., the output). A neural network with only 
fully connected layers is also called a Multiple Layer Perceptron (MLP). From this 
point of view, CNNs are regularized forms of MLPs. 

There are a number of parameters associated with CNNs that must be tuned. Spe- 
cific to convolutional layers, the common parameters are stride, depth, and padding. 
The depth parameter of the output volume controls the neurons in a layer that con- 
nect to the same region of the input volume. Stride controls the translation of the 
convolution filter. Padding allows the augmentation of input with zeros at the bor- 
der of input volume. Other parameters include kernel size and pooling size. Kernel 
size specifies the number of pixels that are processed together, whereas pooling size 
controls the extent of down-sampling. Typical values for both are 2 x 2 in common 
image processing networks. 

In addition to parameter tuning, regularization is also required to design robust 
CNNs. In addition to generic methods for regularization mentioned earlier (such as 
early-stopping, L1/L2 regularization), there are CNN-specific approaches. Dropout 
is acommon measure taken to regularize neural networks. Fully-connected networks 
(or MLPs) are prone to overfitting, because of the large number of connections. An 
intuitive way to resolve this issue is to leave out individual nodes (and the correspond- 
ing inbound and outbound edges) from the training procedure. Each node is left out 
with a probability p (p is usually set to 0.5). During the testing phase, the expected 
value of the weights are computed from different versions of the dropped-out net- 
work. Other simple, CNN-specific parameter tuning techniques limit the number of 
units in hidden layers, number of hidden layers, and number of channels in each 
layer. 

Commonly used CNN architectures include LeNet (LeCun et al. 1989), AlexNet 
(Krizhevsky et al. 2012), ResNet (He et al. 2016), Wide ResNet (Zagoruyko and 
Komodakis 2016), GoogleNet (Szegedy et al. 2015), VGG (Simonyan and Zisserman 
2014), DenseNet (Huang et al. 2017), and Inception (v2 (Szegedy et al. 2016), v3 
(Szegedy et al. 2016), v4 (Szegedy et al. 2017)). 
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3.2.4 Recurrent Neural Networks 


A Recurrent Neural Network (RNN) is a neural network in which nodes have inter- 
nal hidden states, or memory. RNNs can therefore process (temporal) sequences of 
inputs. They are typically used in the analysis of speech signals, language translation, 
and handwriting recognition, and more recently in prediction of atomic trajectories 
in molecular dynamics simulations. 

A key feature of RNN is the ability to share intermediate outputs across different 
parts of the model. Given a sequence of inputs [X1, X2, ..., Xn], the state of RNN at 
time ¢ is given as 

h® = fa), x. 8) (7) 


where, f(.) is the recurrent function, and 0 is the set of shared intermediate outputs. 
From Eq. 7, one can see that RNNs predict the future on the basis of the past outcomes. 
A generic RNN can, in theory, remember arbitrarily long-term dependencies. In 
practice, repeated use of back-propagation causes gradients to vanish (i.e., tend to 
zero), or explode (i.e., tend to infinity). Gated RNNs are designed to circumvent 
these issues. The most widely used Gated RNNs are Long Short-Term Memory 
(LSTM) (Hochreiter and Schmidhuber 1997; Gers et al. 2000) and Gated Recurrent 
Unit (GRU) (Cho et al. 2014). Recall that a regular activation neuron consists of 
a non-linear function applied to a linear transformation of the input. In addition to 
this, LSTMs have an internal cell-state (different from the hidden-state recurrence 
previously discussed), and a gating mechanism that controls the flow of information. 
In all, LSTMs have three gates—input gate, forget gate, and output gate. Specifically, 
the forget gate allows a network to forget old states that have accumulated over time, 
thereby preventing vanishing gradients. GRUs are similar to LSTMs, but with a 
simplified gating architecture. GRUs combine LSTM’s input and forget gate into a 
reset gate. The reset gate also allows GRUs to combine hidden- and cell-states. This 
results in a simpler architecture that requires fewer tensor operations. The problem 
of exploding gradients is handled by gradient clipping. Two common strategies in 
gradient clipping are: (i) value clipping—values above and below set thresholds 
are set to the respective thresholds, and (ii) norm clipping—trescaling the gradient 
values by a chosen norm. Using CNNs and RNNs as building blocks, we can develop 
complex NN frameworks such as Generative Adversarial Networks (GANSs). 


3.2.5 Generative Adversarial Networks 


A Generative Adversarial Network (GAN) is a neural network in which a zero-sum 
game is contested by two neural networks—the generative network and the discrim- 
inative network (Goodfellow et al. 2014). The generative network learns to map a 
pre-defined latent space to the distribution of the dataset, whereas a discriminative 
network is used to predict whether an input instance is truly from the dataset or if it 
is the output of the generative network. The objective of the generative network is to 
fool the discriminative network (i.e., increase error of the discriminative network), 
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whereas the objective of the discriminative network is to correctly identify true data. 
The training procedure for a GAN is as follows: first, the discriminative network is 
given several instances from the dataset, so that it learns the “true” distribution. The 
generative network is initially seeded with a random input. From there, the generative 
network creates candidates with the objective of fooling the discriminative network. 
Both networks have separate back-propagation procedures; the discriminator learns 
to distinguish the two sources of inputs, even as the generative network produces 
increasingly realistic data. 

GANs have found a number of applications in synthesis of (realistic) datasets. 
They have been successful in creating art, synthesizing virtual environments, generate 
photographs of synthetic faces, and designing animation characters. GANs are often 
used for the purpose of transfer learning, where knowledge obtained from one training 
in one application can be used in another similar, but different application. 


3.2.6 Transfer Learning 


Traditional machine learning is isolated, in that a model is trained in a very specific 
context, to perform a targeted task. The key idea in transfer-learning is that new tasks 
learn from the knowledge gained in a previously trained task (Weiss et al. 2016). 
To formally define Transfer Learning, we first define domain and task. Let X be a 
feature space, and X be the dataset (i.e., X = [X1, X2,...,X,] € X). Similarly, let Y 
be the label space and Y = {y1, yo,..., Yn} € Y be the labels corresponding to the 
rows of X. Further, let P(.) denote a probability distribution. A domain is defined 
as D = {X, P(X)}. Given a domain D, a task T is defined as T = {Y, P(Y|X)}. 
Given source and target domains Ds and Dr and corresponding tasks Ts and Try, 
transfer learning aims to learn P(Y7|X7) using information from Ds and Dr. In 
this setup, we can see that there are four possibilities: (i) Xs 4 Xr, (ii) Ys 4 Yr, 
Gi) P(Xs) Æ P(Xr), or (iv) P(Ys|Xs) Æ P(Yr|Xr). In (i), the feature spaces of 
the source and target domain are different. In (ii), the label space of the task are 
different, which happens in conjunction with (iv) where the conditional probabilities 
of labels are different. In (iii), the feature spaces of source and target domains are 
the same, while the marginal probabilities are different. Case (iii) is interesting for 
simulations, because the feature spaces for source (simulation) and target (reality) is 
typically the same, but the marginal probabilities of observations in simulation and 
reality can be very different. 


3.3 Software Infrastructure for Machine Learning 
Applications 


A number of software packages and libraries have been developed over the last 
decade in support of ML applications in different contexts. Matrix computations are 
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often performed using NumPy (Python) (Harris et al. 2020), Eigen ( et al. 2010), 
and Armadillo (C++) (Sanderson and Curtin 2016; Sanderson and Curtin 2020). 
Standard machine learning methods, including clustering such as k-means clustering 
and DBSCAN, classification algorithms such as SVM and LDA, regression, and 
dimensionality reduction are available in Python packages such as SciPy (Virtanen 
et al. 2020) and Theano (Theano Development Team 2016), and in C++ packages such 
as MLPack (Curtin et al. 2018). Deep learning approaches are often implemented 
using libraries such as PyTorch (Paszke et al. 2019), TensorFlow (Abadi et al. 2015), 
Caffe (Jia et al. 2014), Microsoft Cognitive Toolbox, and DyNet (Neubig et al. 2017). 
We note that a number of machine learning packages written in a source language 
have readily available interfaces for other languages. For example, Caffe is written 
in C++, with interfaces available for both Python and MATLAB. Finally, we also 
note that Julia has wrappers for a number of the Python and C++ libraries. 


4 ML Applications in Reactive Atomistic Simulations 


Building on our basic toolkit of ML models and methods, we now describe recent 
advances in the use of ML techniques in reactive atomistic simulations. We focus on 
three core challenges—use of ML techniques for training highly accurate atomistic 
interaction models, use of ML techniques in accelerating simulations, and use of 
ML methods for analysis of atomistic trajectories. Our discussion applies broadly 
to particle methods, however, we use reactive atomistic simulations as our model 
problem. In particular, we use ReaxFF as the force field for simulations. 


4.1 ML Techniques for Training Reactive Atomistic Models 


Optimization of force-field parameters for target systems of interest is crucial for high 
fidelity in simulations. However, such optimizations cannot be specific to the sets of 
molecules present in the target system for two reasons: (i) utility of a parameter set 
that only works for a particular system is marginal; and (ii) in a reactive simulation, 
molecular composition of a system is expected to change as a result of the reactions 
during the course of a simulation. For this reason, reactive force field optimizations 
are performed at the level of groups of atoms, e.g. Ni/C/H, Si/O/H, etc. Nevertheless, 
the behaviour of a given group of atoms may show variations in different contexts such 
as combustion, aqueous systems, condensed matter phase systems, and biochemical 
processes. Therefore, it may be desirable to create parameter sets optimized for 
different contexts (Senftle et al. 2016). 

Reactive force fields such as ReaxFF are complex, with a large number of param- 
eters that can be grouped by charge equilibration parameters, bond order parameters, 
and parameters based on N-body interaction (e.g., single-body, two-body, three-body, 
four-body and non-bonded) in addition to the system-wide global parameters. As the 
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number of elements in a parameter set increases, force field optimization quickly 
becomes a challenging problem due to the high dimensionality and discrete nature 
of the problem. Several methods and software systems have been developed for force 
field optimization over the years, starting with more traditional methods early on and 
moving to ML-based methods more recently. After giving an overview of the force 
field optimization problem, we briefly review traditional methods first and then dis- 
cuss the ML-based techniques, which mainly draw upon Genetic Algorithms (see 
Sect.2.2) as well as the extensive ML software infrastructure that has been built 
recently (see Sect. 3.3). 


4.1.1 Training Data and Validation Procedures 


Training procedures for typical force fields require three inputs: (i) model parameters 
to be optimized; (ii) geometries, a set of atom clusters that describe the key char- 
acteristics of the system of interest (e.g., bond stretching, angle and torsion scans, 
reaction transition states, crystal structures, etc.); and (iii) training data, chemical 
and physical properties associated with these atom clusters (such as energy min- 
imized structures, relative energies for bond/ angle/ torsion scans, partial charges 
and forces), which are typically obtained from high-fidelity quantum mechanical 
(QM) models or sometimes experiments, along with a function that combines these 
different types of training items into a quantifiable fitness value: 


: Xi — Yi g 
Error(m) = ye (==>) : (8) 


i=1 


In Eq. 8, m represents the model with a given set of force field parameter values, x; 
is the predicted training data value calculated using the model m, y; is the ground 
truth value of the corresponding training data item, and a is the weight assigned 
to each training item. 

Table 1 summarizes commonly used training data types and provides some exam- 
ples. An energy-based training data item uses a linear relationship of different 
molecules (expressed through their identifiers) because relative energies rather than 
the absolute energies drive the chemical and physical processes. For structural items, 
geometries must be energy minimized as accurate prediction of the lowest energy 
states is crucial. For other training item types, energy minimization is optional, but 
usually preferred. 


4.1.2 Global Methods for Reactive Force Field Optimization 


The earliest ReaxFF optimization tool is the sequential parabolic parameter interpo- 
lation method (SOPPI) (van Duin et al. 1994). SOPPI uses a one-parameter-at-a-time 
approach, where consecutive single parameter searches are performed until a certain 
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Table 1 Examples for commonly used training items. Identifiers (e.g., ID1) refer to struc- 


tures/molecules 
Type Training item Target Description 
Charge ID11 0.5 Charge for atom 1 (in elementary charge) 
Energy ID1-1D2/2-— 50 Energy difference (in kcal/mol) 
1D3/3 
ID1 —150 
1D3/2-ID 1/3 30 
Geometry ID1 12 1.25 Distance between atom | and 2 (in A) 
ID2 123 120 Valence angle between atom 1, 2 and 3 (in 
degree) 
ID3 1234 170 Torsion angle between atom 1, 2, 3 and 4 
(in degree) 
Force IDI 1 0.5 0.5 0.5 Forces on atom 1 (in kcal/mol A) 
ID2 1.0 RMSG (in kcal/mol A) 


convergence criteria is met. The algorithm is simple, but as the number of parameters 
increases, the number of one-parameter optimization steps needed for convergence 
increases drastically. Furthermore, the success of this method is highly dependent 
on the initial guess and the order of the parameters to be optimized. 

Due to the drawbacks of SOPPI, various global methods such as genetic or evo- 
lutionary algorithms (Dittner et al. 2015; Jaramillo-Botero et al. 2014; Larsson et al. 
2013; Trnka et al. 2018), simulated annealing (SA) (Hubin et al. 2016; Iype et al. 
2013) and particle swarm optimization (PSO) (Furman et al. 2018) have been inves- 
tigated for force field optimization. We discuss some of the promising techniques 
below. 

Genetic Algorithms (GA) often work well for global optimization because via 
crossover they can exploit (partial) separability of the optimization problem even in 
the absence of any explicit knowledge about its presence. They are also able to make 
long-range “jumps” in search space. Due to the continuous presence of multiple indi- 
viduals that have survived several selection rounds it is ensured that these “jumps,” 
based on information interchange between individuals, have a high probability of 
landing at new, promising locations. Last but not least, by admitting operators other 
than the classic crossover and mutation steps, it is possible to extend GAs within this 
abstract meta-heuristic framework with desirable features of other global optimiza- 
tion strategies, too. GAs are especially useful when dealing with challenging and 
time-critical optimization problems. The straightforward parallelism and intrinsic 
high scalability property of GAs provide an advantage over other strategies that are 
either serial in nature or where parallelization facilitates decoupled or only loosely 
coupled task-level parallelism. An efficient and scalable implementation of GAs for 
ReaxFF is provided in the ogolem-spuremd software (Dittner et al. 2015), where 
the authors demonstrate convergence to fitness values similar to or better than those 
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reported in the literature in a matter of a few hours of execution time through effective 
use of high-performance computers and advanced GA techniques. 

Recently, other population-based global ReaxFF optimization methods have been 
proposed, such as the particle swarm optimization algorithm RiPSOGM (Furman 
et al. 2018), covariance matrix adaptation evolutionary strategy (CMA-ES) (Shchy- 
gol et al. 2019), and the KVIK optimizer (Gaissmaier et al. 2022). Shchygol et al. 
(2019) explore different optimization choices for the CMA-ES method, the ogolem- 
spuremd software, as well as a Monte-Carlo force field optimizer (MCFF), and they 
systematically compare these techniques using three training sets from literature. 
Their CMA-ES method is an implementation of the stochastic gradient-free opti- 
mization algorithm proposed by Hansen (2006), where the main idea is to iteratively 
improve a multi-variate normal distribution in the parameter space to find a distri- 
bution whose random samples minimize the objective function starting from a user 
provided initial guess. The MCFF technique is based on the simulated annealing 
algorithm to optimize a given set of parameters. In every iteration, MCFF makes 
a small random change to the parameter vector and computes the corresponding 
change in the error function. Any change that reduces the error is accepted; changes 
that increase the error are accepted with a predetermined probability. With suffi- 
ciently small random changes and acceptance rates, MCFF can become a rigorous 
global optimization method, but at very high computational cost. Through extensive 
benchmarking, Shchygol et al. conclude that while CMA-ES can often converge 
to the lowest error rates, it cannot do this on a consistent basis. The GA method 
employed by ogolem-spuremd can produce consistently good (but not necessarily 
the lowest) error rates, but at higher computational costs compared to CMA-ES. 
Overall, they have found MCFF to underperform compared to CMA-ES and GA for 
similar computational costs. 


4.1.3 Machine Learning Based Search Methods 


While global methods have been proven to be successful for force-field optimization, 
due to the absence of any gradient information, these global search methods require 
a large number of potential energy evaluations, as such they can be very costly. With 
the emergence of advanced tools to calculate the gradients of complex functions 
automatically, machine learning based techniques for optimization of force fields 
have attracted interest. 


iReaxFF: One of the earliest such attempts is the Intelligent-ReaxFF, iReaxFF, soft- 
ware (Guo et al. 2020). iReaxFF uses the TensorFlow library for automatically cal- 
culating gradient information and use local optimizers such as Adam or BFGS. An 
additional benefit of the Tensorflow implementation is that iReaxFF can automat- 
ically leverage GPU acceleration. However, iReaxFF does not have the expected 
flexibility in terms of the training data as it can only be trained to match the ReaxFF 
energies to the absolute energies from Density Functional Theory (DFT) computa- 
tions on the training data; relative energies, charges or geometry optimizations cannot 
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be used in the training, essentially limiting its usability. As iReaxFF tries to exactly 
match the energies of the training data, the transferability of force fields generated by 
iReaxFF is also limited. While it is not clearly stated what kind of gradient informa- 
tion is calculated using Tensorflow, their definition of the loss function (which is the 
sum of the squared differences between absolute DFT and ReaxFF energies) suggests 
that their gradients are calculated with respect to atomic positions, which essentially 
amounts to performing a force matching based force field optimization. The number 
of iterations required to reach the desired accuracies for their test cases is rather 
large, on the order of tens to hundreds of iterations. Even with GPU acceleration, 
the training time for a test case reportedly takes several days. This is partly because 
iReaxFF does not filter out the unnecessary 2-body, 3-body and 4-body interactions 
before the optimization step. 


JAX-ReaxFF: Another recent effort that utilizes the Tensorflow framework is the 
JAX-ReaxFF software (Kaymak et al. 2022). JAX is an auto-differentitation software 
by Google that is built on top of Tensorflow for high performance machine learn- 
ing research (Bradbury et al. 2020), it can automatically differentiate native Python 
and NumPy functions. Leveraging this capability, JAX-ReaxFF automatically cal- 
culates the derivative of a given fitness function with respect to the set of force 
field parameters to be optimized from Python-based implementation of the ReaxFF 
potential energy terms. By learning the gradient information of the high dimensional 
optimization space (which generally includes tens to over a hundred parameters), 
JAX-ReaxFF can employ highly effective local optimization methods such as the 
Limited Memory Broyden—Fletcher—Goldfarb—Shanno (L-BFGS) algorithm (Zhu 
et al. 1997) and Sequential Least Squares Programming (SLSQP) (Kraft et al 1988) 
optimizer. The gradient information alone is obviously not sufficient to prevent local 
optimizers from getting stuck in a local minima, but when combined with a multi- 
start approach, JAX-ReaxFF can greatly improve the training efficiency (measured in 
terms of the number of fitness function evaluations) performed. As they demonstrate 
through a diverse set of systems such as cobalt, silica, and disulfide, which were also 
used in other related work, they can reduce the number of optimization iterations 
from tens to hundreds of thousands (as in CMA-ES, ogolem-spuremd or iReaxFF) 
down to only a few tens of iterations. 

Another important advantage of JAX is its architectural portability enabled by the 
XLA technology (Sabne 2020) used under the hood. Hence, JAX-ReaxFF can run 
efficiently on various architectures, including graphics processing units (GPU) and 
tensor processing units (TPU), through automatic thread parallelization and vector 
processing. By making use of efficient vectorization techniques and carefully trim- 
ming the 3-body and 4-body interaction lists, JAX-ReaxFF can reduce the overall 
training time by up to three orders of magnitude (down to a few minutes on GPUs) 
compared to the existing global optimization schemes, while achieving similar (or 
better) fitness scores. The force fields produced by JAX-ReaxFF have been validated 
by measuring the macroscale properties (such as density and radial distribution func- 
tions) of their target systems. 
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Beyond speeding up force field optimization, the Python based JAX-ReaxFF 
software provides an ideal sandbox environment for domain scientists, as they can 
move beyond parameter optimization and start experimenting with the functional 
forms of the interactions in the model, add new types of interactions or remove 
existing interactions as desired. Since evaluating the gradient of the new functional 
forms with respect to atom positions gives forces, scientists are freed from the burden 
of coding the lengthy and error-prone force calculation parts. Through automatic 
differentiation of the fitness function as explained above, parameter optimization for 
the new set of functional forms can be performed without any additional effort by 
the domain scientists. After parameter optimization, they can readily start running 
MD simulations to test the macro-scale properties predicted by the modified set of 
functional forms as a further validation test before production-scale simulations, or 
go back to editing the functional forms if desired results cannot be confirmed in this 
sandbox environment provided by JAX-ReaxFF. 


4.2 Accelerating Reactive Simulations 


We now discuss how ML techniques can be directly used to accelerate reactive 
simulations and to improve their accuracy in different application contexts. 


4.2.1 Machine Learning Potentials 


At a high level, ML based potentials can be defined as follows (Behler 2016): 


1. The potential must establish a direct functional relation between atomic config- 
uration and the corresponding energy, where the functional must be based on an 
ML model. As an example, a forward propagating deep neural network may serve 
as a functional, where input is the atomic configuration and output is the energy. 

2. Any physical approximations or theoretically grounded constraints are explicitly 
incorporated into the training data and are not part of the energy functional. 


The second requirement in the definition distinguishes traditional fixed form poten- 
tials from the ML potentials. It also ensures that for a “sufficiently complex” energy 
functional and “sufficiently large and diverse” training set, an ML based potential can 
produce arbitrarily accurate model predictions. Often it is expected that the training 
data are generated using a consistent and specific set of methods. It has been observed 
that mixing data from different QC techniques or experiments lead to poor learning 
outcomes. Sizes of the training sets depend on the computational cost of the training 
sets and the desired accuracy expected out of the ML model. 

As with most traditional fixed form potentials, ML potential energy is expressed 
a sum of local energies: 
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where the local energy corresponds to the ML energy, which depends on the local 
neighborhood of the ith atom. Chemical environment of an atom is primarily decided 
by short range interactions (Kohn 1996). The long range interactions, which decay 
slower than r7?, are usually either approximated at cutoff distance R, as zero or 
smoothly reduced to zero using tapering functions. As an example, polynomial taper- 
ing functions are used in the ReaxFF. The accuracy of such model depends on the 
cutoff distance R; — larger values of Re lead to better approximation of long range 
interactions. However, larger R, implies larger atomic neighborhood (which grows 
as R3), which means that more sample points are required in the training set. Thus 
R. must be chosen appropriately to provide better long range approximation while 
keeping the neighborhood size tractable. 


4.2.2 Training Considerations 


ML potentials, like fixed form potentials require training. Here we briefly explore 
the steps and potential issues with the design and training of ML potentials (See 
e.g. (Unke et al. 2021)). 


Choice of quantum methods used in generation of training data: Typically ML 
based simulations are orders of magnitude slower than the equivalent fix form 
potential simulations (Brickel et al. 2019). However, unlike the fixed form 
potentials, ML potentials may offer accuracy similar to that of an ab initio 
method (Sauceda et al. 2020). Thus it is essential to choose an appropriate ab 
initio method. On one hand if the ab inito method is very fast and/or less accurate, 
it defeats the purpose of further approximating these data into a machine learnt 
model. On the other hand a method such as CCSD(t), that are computationally 
expansive makes it difficult to generate enough training data for ML models. 

How much data? The amount of data needed depends on the size of the ML 
model, the desired accuracy, and the sampling technique used in producing the 
data set. 

Sampling: Sampling of training data over the domain of atomic configurations 
is crucial in achieving good training of the potentials. For the models designed 
to simulate equilibrium problems, one can potentially rely on samples that are 
output of an ab initio molecular dynamics simulation. Depending on the desired 
accuracy, generating such samples can become prohibitively expensive. Another 
alternative is to use meta dynamics type sampling techniques and generate samples 
that are in the vicinity of the free energy minima of the system. However, if the 
model is intended to address chemical reactions or transition states, then a more 
uniform sampling is required where “rare events” are also sampled with relatively 
higher frequencies. The framework provided by an ML model does not include 
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any “physics” of the problem, thus the training data must sample the configuration 
space sufficiently to include the relevant “physics” in the problem. 

Training/validation and testing: In usual ML methodology, models are trained 
and tested against similarly structured but disjoint data sets. In this case, the 
training and the validation is performed on the data sets that are similarly sampled 
but distinct. However, the testing of the model is usually performed against bulk 
or physically measurable quantities computed using the trained models. Often 
the ML potential frameworks have hyperparameters that require a second step 
of optimizations. The testing phase must be repeated for ifferent hyperparameter 
values. 


4.2.3 Descriptors 


Unique description of atomic neighborhood is a central issue in structure—function 
prediction problems in biophysics and materials science (Ghiringhelli et al. 2015; 
Deviller and Balaban 1999; Valle and Oganov 2010). For ML systems, such unique- 
ness is crucial for effective training. Thus, one must express any atomic neighborhood 
in a representation that is invariant with respect to the action of the symmetry group 
of the system. In case of three dimensional atomistic systems, we have a group of 
Galilean transformations and discrete group of atomic permutations. We summa- 
rize commonly used descriptors, noting that the state of the art in this context is 
continually evolving. 


Atom Centered Symmetry Function (ACSF) 


This descriptor expresses the environment of ith atom in terms of a Gaussian basis 
of varying widths and angular basis at different resolution. It uses a cosine taper 
function given by: 


1 (cos (3) + 1) for r;; < Re 
Try) = 4? Ri T (9) 
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where r;j is the distance between ith and jth particles. This ensures that, when 
multiplied, the quantity goes smoothly to zero as r;; approaches R, from below. 
Using this taper function, an atom centered descriptor can be written with radial and 
angular parts as: 
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where n is the number of neighbors in cutoff distance R., A = +1. The descriptor 
vector is generated by sampling the parameters 7, ¢, u, and A. By design, ACSF 
produces a description that is invariant under translation and rotation. We note that the 
number of symmetry functions needed does not depend on n. However, the number 
of symmetry functions grow very rapidly. Typically for an atom 50-100 symmetry 
functions are used with various values of parameters (Behler 2016). Further the 
number of functions required grows quadraticaly with respect to the number of types 
of atoms used in the model. ACSF can be generalized with additional weight functions 
to improve resolution and complexity (Gastegger et al. 2017). 


Coulomb Matrix (CM) 


An alternate descriptor uses the Fourier transform of the Coulomb matrix (Rupp et al. 
2012), which is defined as: 


1 72.4 ° ; 
fe i=j 

Mij = 42:4, a (12) 
ear 1 


where Z; is the chanrge on the ith particle. This descriptor is invariant under the 
transformations listed, however, it is computationally expensive unless restricted to 
a local coulomb matrix (Rupp et al. 2012). The descriptor can be further generalized 
to include Ewald matrix instead of Coulomb matrix (Faber et al. 2015). 


Bispectral Coefficients (BC) 


In this descriptor, the atomic environment is represented as a local density that is 
expressed in terms spherical harmonics on a four dimensional sphere. The density 
is written as superposition of delta function densities using the taper function from 
Eq. (9) as: 


pi(r) = ô(r;) + 5 Tr, (rij)@jô (r — rij), (13) 


rij < Re 


where the dimensionless parameter w; represents atom type or other internal prop- 
erties of the jth atom. Angular part of such density can be expanded in spherical 
harmonics basis and radial part can be expanded in terms of a linear basis. The 
radial part is transformed into an additional angle, converting the basis to spherical 
harmonics on 3-sphere. Let U} m be these hyper-spherical harmonics, then one can 
express the local density as: 


o0 j 
= 2 ae Cm, hn w m' (14) 


where cl m are the AA of expansion. The c} 


the inner product (U, } „10). The BC are then computed using the mixing rules as: 


n m are Computed by evaluating 
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where C oe jam, ate the Clebsch-Gordon coefficients of mixing. These descriptors 


also satisfy the required invariance properties. One key advantage of BC over ACSF 
is that BCs can be systematically expanded or truncated based on accuracy versus 
complexity trade offs of the model (Thompsona et al. 2015). 


Smooth Overlap of Atomic Positions (SOAP) 


In SOAP descriptor local density is generated by smoothing delta functions into a 
Gaussian as (Albert et al. 2013) 


Ni 


Ppsoar (r) = > Pe ae) 
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This density can be expanded in term of radial and angular basis as 


Ni 
psoart) = X Yo Ch m8n)Yim (O, $), 
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where Y; ,, (0, @) are spherical harmonics basis, and g, (r) is a radial basis set chosen 
based on specific model. Thus the descriptor for atom i is written as an appropriately 
normalized power spectrum 


Pn,k (i) = X Crm (chaim) £ 


m 


4.2.4 Energy Functionals 


The input to the ML model is a descriptor using one of the models described above. 
The output of the ML model is an energy functional. We describe common forms of 
the energy functional here. 


Feed Forward Neural Network Based Energy Functional 


One of the common ML energy functionals is based on feed forward neural networks 
(FFNN) (see e.g. Blank et al. (1995), Gassner et al. (1998), Lorenz et al. (2004), 
Manzhos and Carrington (2006), Behler et al. (2007), Geiger and Dellago (2013), 
Behler (2014), Behler (2015)). These networks typically use descriptor as input and 
produce an energy value as output. One can write the energy as: 
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Ei = 8m O 8m-19°*:9 820 81 (bi + Wo, -G;) 
Agar = gk(hk) = febr + Wkr-1,x : hk-1), 


where the neural network has m layers, Wz—1,x, by are the weights and the bias values 
associated with the kth layer respectively, and f; are the nonlinear activation functions 
associated with the kth layer. Forces are computed as the negative gradients of the 
energy functional. Thus we expect the activation functions f to be differentiable 
functions. 


Gaussian Approximation Potential (GAP) 


This approximation establishes a mapping between the environment of an atom and 
the corresponding energy using a Gaussian kernel function. 


Ni 
E; = 2,216, br) 
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where L is the number of truncated bispectrum components, b are the BCs. The 


determination of the coefficients œ, is computationally expensive, since it grows as 
N? (Li et al. 2015). 


Spectral Neighbour Analysis Potential (SNAP) 


SNAP simplifies the computation of œ; in GAP by changing problem of Gaussian 
regression to a linear regression. Thus now the energy functional is given by (Thomp- 
sona et al. 2015) 


M 

wi wi i 
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where M is the number of bispectrum coefficients used in an approximation. Most 
important advantage of SNAP over GAP is the simplification of computation due to 
linear regression. 


4.2.5 Accelerating Time-stepping Using Deep Networks 


We have previously described the use of ML potentials to increase the accuracy and 
scope of modeled interactions. An important bottleneck in reactive atomistic sim- 
ulations is the need for small timesteps (sub-femtoseconds in typical applications), 
whose sequential nature limits the temporal scope of simulations. There have been 
some recent efforts aimed at ML techniques for long-timestep integration. Conven- 
tional time-stepping schemes use the current atomic state (and in some cases, the few 
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states leading up to the current state), combined with the force (derived from energy) 
to advance system state to the next step. The goal of ML-based time integrators is 
to use a sequence of past atomic states, along with the energy, to predict system 
state over longer timesteps (e.g., three orders of magnitude longer than conventional 
integrators). 

The use of multiple past states in predicting the next state motivates the use 
of Recurrent Neural Networks (RNNs) for this task. Recall that RNNs use inter- 
nal states to process time-series data. To address the ‘vanishing gradient’ problem 
discussed in Sect. 3.2.2, RNN variants such as Long Short-Term Memory (LSTM) 
networks are used for this purpose. There are three key issues in the use of LSTMs 
in long time-step integrators: (i) specification of input states for the deep network; 
(ii) the network architecture; and (iii) training process. The input to a LSTM-based 
time integrator is typically limited to a finite region around the atom for which the 
trajectory is predicted. Larger neighborhoods require significantly larger number 
of degrees of freedom in the network. While in theory, this would improve accu- 
racy, the need for large amounts of training data and training error typically negate 
this improvement in accuracy. The network architecture is determined by the com- 
plexity of the energy functional and specific domain properties. In current practice, 
even simple energy terms (Lennard-Jones interactions) require large networks (100K 
parameters) for ensembles of as few as 16 particles. The need for training data and 
associated training cost for these is significant. However, such integrators are shown 
to be capable of timesteps three orders of magnitude longer than conventional Verlet 
integrators (Kadupitiya et al. 2020). 

In current proposals, which are in relative infancy, the training procedures for the 
LSTMs use simulation data generated from the specific potential, with well specified 
boundary conditions (e.g., periodic boundaries). Even in these simple systems, a large 
amount of training data is needed to accurately predict trajectories. It is observed 
that for more complex potentials (with multiple terms) and diverse atomic contexts, 
the need for training data increases substantially. 

We note that the use of deep networks for particle dynamics is in relative infancy. 
There has been significant interest in the use of deep networks for time-integrating 
ODEs since the recent work of Chen et al. (2018). Recent advances include symplectic 
ODE-Nets for learning the dynamics of Hamiltonian systems (Zhong et al. 2019), 
and associated deep learning architectures (Rusch and Mishra 2021). 


5 Analyzing Results from Atomistic Simulations 


A key use of machine learning techniques is in the analysis of large amounts of data 
generated from time-dependent simulations. This data generally takes the form of 
snapshots of trajectories—with each snapshot corresponding to system state com- 
prised of degrees of freedom (position, momentum, etc.) associated with particles, 
and in the case of reactive simulations, bond information. Complex simulations scale 
to millions of particles and beyond, over billions of time-steps—leading to datasets 
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that are in excess of terabytes. A number of techniques are deployed to deal with this 
data volume, including subsampling for reducing storage, indexing for fast access, 
and compression. While these techniques facilitate storage and access, the focus 
of this section is primarily on analysis techniques that abstract and extract useful 
information from trajectories. 

We note that ML techniques for analyses of time-dependent simulation is an active 
area of research. This section summarizes the rich state of the art in the area—for 
a more detailed recent summary, we refer readers to excellent reviews by Glielmo 
et al. (2021), Sidky et al. (2020), and Noé et al. (2020). 


5.1 Representation Techniques 


We consider a general class of simulations that result in a set of T snapshots of data— 
each snapshot S;,i = 0...7 — 1, stored as a D dimensional vector, in a matrix M 
of dimension T x D. The first challenge we face is to suitably encode system state 
at time t; into a corresponding vector S;. This poses challenges w.r.t. different data 
structures and their consistent encoding. We consider two common data structures 
and associated representation techniques: 


Vector Fields 


The most common data associated with particles is in vector fields. This includes 
position data, momentum, and other particle properties. The first step in represent- 
ing these vector fields is to account for underlying invariants. For instance, a par- 
ticle aggregate (e.g., a molecule) may be invariant under rotation and translation. 
To account for this invariance, these aggregates must be represented in a canonical 
framework so that two aggregates in different orientations can be viewed as being 
identical under affine transformations. The most common technique relies on align- 
ing particle aggregates with known reference aggregates (e.g., reference geometries 
of molecules) and to store them as deviations from these reference molecules under 
affine transformations. Such transformations can easily be computed through local 
formulations solved using Shapelets or global formulations such as the Orthogo- 
nal Procrustes Problem, which has an optimal solution due to Kabsch (1976). Once 
suitable alignments have been computed, the particle aggregates are stored as suit- 
able vectors of deviations from reference aggregates. When reference aggregates 
are unavailable, canonical representations can be derived through suitable internal 
representations, for example, in the form of internal distances between reference 
particles (e.g., distance between pairs of marked atoms in a molecule). This vector 
of distances provides a canonical representation. 


Network Models 


Reactive simulations often store bond structure of molecules within snapshots 5S;. 
These structures are invariant to within an isomorphism; i.e., any relabeling of atoms 
in the molecule should be treated identically. Canonical labelings are challenging 
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because there exist an exponential number of permutations, and corresponding label- 
ings. Deriving canonical labelings to represent graphs corresponding to molecular 
structures as vectors require solution of the graph isomorphism problem. For small 
molecules, this can be done by enumeration; however, for larger molecules, this is 
more computationally expensive. One solution to this problem relies on a diffusion 
kernel to derive canonical labelings. The Laplacian of the given graph structure is 
used to simulate a diffusion process on the graph. The stationary probabilities asso- 
ciated with the diffusion process are used to represent the graph in a canonical vector 
form. One may also view this vector in terms of the spectra of the graph. Other 
approaches to canonical labelings rely on graph neural networks (GNNs). These net- 
works are trained to input a given graph and to generate canonical labels as output. 
This training procedure for GNNs associates the identical labelings for isomorphic 
graphs. 


5.2 Dimensionality Reduction and Clustering 


Using suitable representation techniques, state S$; at timestep i is represented as a 
vector v; in dimension D,. We use subscript n to denote the native dimension of the 
representation. The next step in typical analyses is to reduce the native dimension 
D, to a lower (reduced) dimension D,. This facilitates downstream analyses by 
denoising data (filtering dimensions that are less important), while simultaneously 
reducing computational cost. Dimensionality reduction is accomplished through the 
linear (PCA, SVD, NMF, AA) or non-linear techniques (Kernel PCA, Autoencoders) 
described in Sect. 3.1. 


5.3 Dynamical Models and Analysis 


Molecular systems evolve through a dynamical operator acting on successive system 
states. This motivates the natural observation that the data-points associated with 
temporal snapshots are not independent; rather, they have temporal correlations that 
reveal interesting aspects of underlying systems. Identification of temporally coher- 
ent subdomains is an important analysis task. The starting point for such analysis is 
a time-lagged covariance matrix, which is computed as the distance (normalized dot 
product) of a state descriptor at time t with that at time t + ôt, for a suitably selected 
lag ôt. A commonly used method, Time Lagged Independent Component Analy- 
sis (TL-ICA) uses this time-lagged covariance matrix, along with the covariance 
matrix at current state to define a generalized eigenvalue problem. The eigenvec- 
tors derived from this generalized eigenvalue problem correspond to the slow modes 
in the underlying dynamics in the system. We refer to the work of Naritomi and 
Fuchigami (2013) for a detailed description of this method and its use in analysing 
atomic trajectories. These approaches are generalized into a variational framework 
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that aims to characterize the dominant eigenpairs of the propagation operator cor- 
responding to the dynamical system. This is achieved by first computing a discrete 
approximation to the propagation operator, which uses abstractions of the self and 
time-lagged covariance matrices to compute transition probabilities for each state 
at time ¢ to a state at time ¢ + ôt. The eigenvectors of this operator correspond to 
the dominant modes in the system. This general variational model is equivalent to 
TL-ICA if data points are represented through a linear basis. However, the variational 
model admits a more general basis, through the use of higher-order kernels and the 
underlying optimization problem is solved using conventional gradient-descent type 
methods. 


5.4 Reaction Rates and Chemical Properties 


Reactive simulations often produce diverse chemical constituents. Some of these 
compounds are transient, however these still require careful analysis and classifi- 
cation. In the simple case of two component Silica—Water system, the molecular 
components observed at the end of the simulations include Si-O, Si-O2, OH, H3 
etc. (Fogarty et al. 2010). Identifying all the molecular components and correspond- 
ing chemical reaction is a difficult problem. 

In order to enumerate all the molecular components, one can treat a simulation 
time step as a colored graph with atom type as color on the node and the existence 
of an edge between two atoms is decided by the bond order between the pair being 
greater than a cutoff value. Further the enumeration requires identification of all the 
distinct classes of isomorphic subgraphs of atoms. Each such class entry is either a 
molecule or molecular fragment present in a simple time frame. Then a hash table 
of such fragments is constructed to label the frequency of occurrence of reactant or 
product in a single time frame. 

For the most common molecular fragments, often itis possible to identify reactions 
of kind, A + B = AB. Such reactions can be modeled using first order differential 
equations, which can be solved as: 


Ky-N 
Nap(t) = oar a (1 — exp[—(Ky + Ks)(¢ — to)]) , (16) 


where N is total number of molecules of type A and B, Nag is the number of 
molecules of AB; K f, K, are forward and backward reaction rates respectively (Saun- 
ders et al. 2022). Within simulations the computed number of molecular types can 
be fitted to Eq. (16) as a function of time, giving the reaction rates and equilibrium 
concentrations of various chemical components. 
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6 Concluding Remarks 


In this chapter, we presented an overview of common ML techniques and formula- 
tions. We discussed how computationally expensive components of reactive atomistic 
simulations are formulated in ML frameworks, considerations for training ML mod- 
els, tradeoffs of accuracy, need for training data, transferrability, and computational 
cost. While we primarily focused on reactive atomistic simulations, the models and 
methods discussed apply more generally to discrete element models. 

The area of ML techniques for reactive simulations is extremely active and fluid. 
There is tremendous potential for significant new developments in the area, enabling 
simulation scales and scope far beyond those currently accessible. In doing so, these 
techniques hold the promise of new applications and domains. 
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simulations. The new framework — composed from signature, measure, and decision 
building blocks with well-defined semantics — is tailored for parallel and distributed 
computing, has bounded communication and storage requirements, is generalizable 
to a variety of applications, and operates in an unsupervised fashion. We demonstrate 
the efficacy of our framework on several cases spanning scientific domains and 
applications of event detection: optimized input/output (I/O) in computational fluid 
dynamics simulations, detecting events that can lead to irreversible climate changes 
in simulations of polar ice sheets, and identifying optimal space-time subregions for 
projection-based model reduction. Additionally, we demonstrate the scalability of 
our framework using a HPC combustion application on the Cori supercomputer at 
the National Energy Research Scientific Computing Center (NERSC). 


1 Introduction 


Scientific investigations — whether computational, experimental or observational — 
are ever expanding to include larger sets of coupled physics spanning broader ranges 
of scales, and the volumes of data generated from these investigations consistently 
outpace the growth of computational and data storage resources. As a consequence, 
specifically in the area of HPC modeling and simulation, the process of mining sci- 
entific data to glean insight is shifting from one of a posteriori to one of in situ 
analysis, i.e., analysis performed simultaneously with a simulation while sharing 
resources with it. Capturing events of interest to scientists in complex, high-fidelity 
HPC simulations is difficult because it is rarely feasible to export the entire simula- 
tion state at every timestep. Crucial stages in the development of events can be lost 
between checkpoints, and ephemeral events can be missed altogether, making a pos- 
teriori event detection problematic. Identifying events in situ is equally challenging, 
as traditional analysis algorithms that assume global access to data require excessive 
communication bandwidth. 

Machine learning (ML) is being applied to scientific data for various purposes, 
including establishing constitutive laws, developing mathematically and statistically 
compact models of governing physics, identifying embedded patterns, dimensional- 
ity reduction, parameter importance and sensitivity analysis, and uncertainty quan- 
tification (UQ). In this work we focus on one specific application of ML: in situ event 
detection. Specifically, we seek to develop event detection algorithms that are: 


e Generalizable: deployable in a variety of different scientific computing domains 
without the need for application-specific tuning; 

e Unsupervised: able to operate without labeled examples defining events 
of interest; 

e Low Overhead: requiring minimal communication between processors; 

e Online: able to make predictions with minimal retention of data from prior 
timesteps. 
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To motivate the main contributions of this chapter, we first provide a brief overview 
of related past work. 


1.1 Overview of Related Work 


Event detection is related to anomaly detection, since the purpose of each is to 
detect behavior that is locally different. There has been substantial previous research 
on developing streaming anomaly detection algorithms for HPC simulation data. 
However, many of these algorithms require significant communication between pro- 
cessors. For example, Wu et al. (2014) proposed the Random Subspace Forest (RS- 
Forest) algorithm in which decision trees with random splits and random thresholds 
are used to construct a density estimate over the data observations in a continuous 
feature space. While this algorithm is very fast for local or shared memory applica- 
tions, it is not communication efficient in this context because it requires sharing the 
entire RS-Forest data model across all processors. Similarly, Kernel Density Esti- 
mation (KDE) has been proposed for online anomaly detection (Ahmed 2009), but 
also requires significant communication between processors. 

Some anomaly detection methods have been designed for parallel implementation 
with low communication overhead. Zhao et al. (2009) proposed a parallel framework 
for k-means clustering that could be adapted for anomaly detection. However, k- 
means clustering requires a user-defined number of clusters k, and performance is 
often strongly dependent on the selected value of this variable. Such sensitivity to 
algorithm parameters is undesirable for unsupervised in situ event detection. 

Application-specific event detectors have also been developed. These include 
detectors to flag when ignition has occurred in combustion simulations (Bennett et al. 
2016) and tropical cyclone trackers for climate simulations (Ullrich and Zarzycki 
2016; Zhao et al. 2009). These algorithms make use of significant domain knowledge 
and are only applicable in the specific field for which they were developed, which is 
contrary to our goal of developing generalizable algorithms. 

Ensemble anomaly detection techniques, such as iForest (Liu et al. 2012) and 
iNNE (Bandaragoda et al. 2014), are often considered to be robust and highly gener- 
alizable. Furthermore, these techniques have been shown to be compatible with data 
sub-sampling. The disadvantage of these methods is that they require communication 
to share the ensemble model between processors. For large ensembles this overhead 
can be prohibitively high. 

Finally, it is not clear that conventional anomaly detection algorithms are well- 
suited for event detection in simulations. Because simulations often make use of 
highly refined meshes to resolve complex physical phenomena, an event of interest 
could occur over tens of thousands of mesh points, making it well-represented in 
the data, and therefore not anomalous. Moreover, comparisons to previous timesteps 
also are not straightforward, since many simulations exhibit significant drift over 
time: what is unusual at one timestep might become the norm later in time. 
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1.2 Contributions and Organization 


We present herein a novel framework for applying ML to detect events of interest in 
situ in HPC simulation data. In this context, “events of interest” can be defined as 
any local dynamics in a region that differ significantly from the dynamics of other 
regions or timesteps. Our framework is tailored for parallel and distributed computing 
with the data typically representing a space-time domain of interest, with the spatial 
domain distributed across computing resources (processors/nodes) and data along 
the time dimension arriving in a streaming manner. 

Consider a region handled by a single processor exhibiting behavior that differs 
significantly from the regions on other processors. Such a region could be considered 
interesting even if the behavior persists over multiple timesteps. An example of this 
type of event could be a tropical cyclone that persists over many timesteps in a 
weather simulation but is geographically localized. We refer to events of this type as 
spatial events of interest. Conversely, a sudden change across all processors from one 
timestep to the next could also be considered interesting. An example of this type 
of event could be simultaneous ignition across an entire domain in a combustion 
simulation. We refer to these as temporal events of interest. 

This research presents a framework for developing in situ spatial and temporal 
event detection algorithms with tightly bounded communication and storage require- 
ments, composed from signature, measure, and decision building blocks with well- 
defined semantics. The goal of this framework is to facilitate event detection in a 
computationally scalable and efficient manner, while allowing the flexibility to com- 
pose a learning workflow best suited for the scientific domain and problem at hand. 
The proposed framework can be used not only to optimize I/O within an HPC simu- 
lation (by flagging the locations where events of interest occur so that only a subset 
of the simulation state is stored to disk), but also to detect scientifically meaning- 
ful phenomena within HPC simulations and even to improve a simulation’s accu- 
racy/efficiency. A detected event can be used as a trigger for mesh and/or timestep 
refinement, e.g., Adaptive Mesh Refinement (AMR) (Berger and Oliger 1984). 

The remainder of this chapter is organized as follows. The specific components of 
the proposed event detection framework are detailed in Sect. 2. In Sect. 3 we present 
results from three use cases that demonstrate the versatility and composability of the 
framework. The use cases span different scientific domains and different applications 
of event detection: optimized I/O in fluid flow simulations (Sect. 3.1), detecting events 
that are scientifically interesting in ice sheet simulations (Sect. 3.2), and identifying 
optimal space-time sub-regions for projection-based model reduction (Sect. 3.3). 
Section 3.4 presents results, using an exemplar turbulent combustion simulation, 
that demonstrate the scalability and computational efficiency of the framework when 
deployed in parallel computing simulations. Finally, conclusions are provided in 
Sect. 4. 
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2 Approach 


Our framework for event detection is as follows. First, we assume a simulation 
domain with any number of dimensions. We further assume that the domain is 
divided into a set of P analysis partitions, where each analysis partition p; = 0,...., 
P — 1isaspatially-contiguous subset of mesh points of the simulation domain. Each 
partition is always associated with a single processor so that analysis partitions never 
straddle processor boundaries or migrate from one processor to another throughout 
the simulation. Thus, a single processor will be responsible for one-to-many analy- 
sis partitions, with the size and number of partitions chosen based on the problem 
domain (Fig. 1). 

Next, we execute the following workflow at each timestep of the running simula- 
tion. For each analysis partition p; we compute a signature s;, a fixed-length vector 
representing the simulation state within that partition where |s;| < | p;| (Fig. 2). Con- 
ceptually, signatures are compressed, low-dimensional representations of an analysis 
partition’s content, and our intent is that the signature should contain crucial aspects 


Simulation Domain Processor Partitions Analysis Partitions 


Fig. 1 Example simulation domain (gray), split across processors (green), and divided into analysis 
partitions (blue) 


aeti 
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Analysis Partitions Signatures 


Fig. 2 Each analysis partition is represented by a low-dimensional signature 
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Table 1 Signature functions 


Signature Description 

fieda Vector of feature importance values described in Ling et al. (2017) 

fmm Vector based on the Feature Moment Metric algorithm in Konduri et al. (2018) 
mean Vector of mean values for each simulation feature 


minimax Vector of minimum and maximum values for each simulation feature 


quartile | Vector of quartile boundaries for each simulation feature (a generalization of 
minimax) 


svd Vector of singular values computed using an SVD on the flattened partition state 
matrix 


Spatial Comparison Temporal Comparison 


Fig. 3 Signatures can be compared all-to-all across analysis partitions to identify spatial events 
(left), and current signatures can be compared to previous signatures within partitions to identify 
temporal events (right) 


of the state of the simulation within that partition, stored in such a way that changes 
across space or time can be detected by subsequent analysis of that representation. 
As an example, for a simulation with state variables F € R”, a signature could be 
vector of size 2|F'| containing the minimum and maximum value for each variable 
f € F within the partition. Of course, this is only one possible signature type among 
many (we call this type minimax); we provide the subset of signature functions used 
in our experiments in Table 1. Note that, because analysis partitions are always asso- 
ciated with a single processor, computing signatures can be a purely local operation. 
Further, because signatures are a small, fixed size relative to the partitions they rep- 
resent, they can be broadcast to other processors for spatial (partition-to-partition) 
comparisons and stored between timesteps for temporal (timestep-to-timestep) com- 
parisons (Fig. 3). The user can choose, based on domain knowledge and the problem 
specifics, the set of features used to compute signatures. This set could consist of 
all of the state variables, a subset, derived variables, or any combination thereof; the 
only requirement is that the same set of features be used across all analysis partitions. 
Given a set of signatures, we can compute spatial or temporal measures to identify 
events. Measures are functions applied to signatures that detect changes across space 
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Table 2 Spatial measures 


Measure Description 
dbscan Uses DBSCAN (Ester et al. 1996) to flag outlier signatures as events 
m1 M1 metric described in Ling et al. (2017) 


ml-hellinger | Modified version of the Hellinger distance used as a spatial metric, 
described in Konduri et al. (2018) 


msd Compares the distance between one signature and the mean signature for 
all partitions 


sigscal Normalizes the signature matrix using the product of the inverse of each 
signature and measures the ability of each signature to drive values in the 
product to zero 


Table 3 Temporal measures 


Measure Description 

changefreq Counts the number of changes (dramatic increases or decreases across any 
two timesteps within a temporal window) 

maxchange Uses the maximum change across any timesteps within the current 
temporal window 

mse Based on mean squared error between the current temporal block 
(S x Twindow Matrix) and previous temporal block 

psd Estimates power spectral density (power spectrum) for each feature within 
the temporal block (S x Twindow Matrix) using Welch’s method (Welch 
1967) 

svd Ratio of the largest non-zero singular value to the smallest non-zero 


singular value 


or time. Spatial measures compare signatures across analysis partitions to identify 
spatial events; typically, they compare the signature for a given partition to every other 
partition’s signature, which requires communication. Temporal measures compare 
an analysis partition’s current signature to its past signatures and are thus completely 
local, requiring only storage of a finite number of signatures from previous timesteps. 
In both cases, the output of the measure is a per-analysis-partition continuous scalar 
value indicating how interesting the partition’s state is at the current timestep. We 
list representative spatial and temporal measures implemented for our experiments 
in Tables 2 and 3, respectively. 

Finally, we use decision functions to convert continuous per-analysis-partition 
measures into boolean values to indicate whether the partitions should be flagged as 
containing events of interest for the current timestep. Decision functions are purely 
local, requiring no communication. Table 4 describes the decision functions that we 
used in our experiments. 

We refer to a combination of signature, measure, and decision functions as an 
algorithm for in situ event detection; because we have many instances of each type, 
and they can be combined almost without exception, there are many possible algo- 
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Table 4 Decision functions 


Decision Description 
threshold Flag a partition when its measure exceeds a fixed threshold 
percentile Flag a partition when its measure exceeds the nth percentile of the 


measure for all partitions 


memory Decision modifier which continues to flag a partition for a fixed number of 
timesteps after another decision function has flagged it 


rithms that can be created with just a few components (and the set of components 
continues to grow as we explore new ideas). The few incompatibilities tend to be 
driven by the expected inputs for a component. For example, it makes little sense 
to combine the dbscan measure with the percentile decision, since the former only 
produces binary values as output, and the latter is only useful with a continuous 
distribution as input. 


3 Results 


In this section we demonstrate our methodology on three important use cases 
for in situ machine learning: data capture for optimizing I/O (Sect. 3.1), detection 
of interesting physical events (Sect. 3.2), and facilitating reduced order model con- 
struction (Sect. 3.3). The use cases represent different scientific domains, but have 
similarities with reacting flows: Sect. 3.1 pertains to low speed non-reacting tur- 
bulent flows with passive tracers; Sect. 3.2 pertains to an incompressible fluid flow 
(glacier ice) solved using Stokes flow equations; Sect. 3.3 pertains to supersonic flow 
with shock. The purpose behind choosing such different use cases is to illustrate the 
generality of our detection algorithms. 


3.1 Data Capture for Optimal I/O: Mantaflow Experiments 


In our initial round of experiments, our focus is on testing the utility of our framework, 
and quantifying whether it could be used for meaningful reductions in I/O. We 
begin by creating a reference implementation using Python (2022), Numpy (Walt 
et al. 2011), Scipy (Jones et al. 2001) and Scikit-Learn (Pedregosa et al. 2011). To 
simplify development and support rapid iteration, these experiments use Mantaflow 
(Thuerey and Pfaff 2018) — an open source library targeting fluid simulation research 
in computer graphics and machine learning — for the simulation. Despite being a serial 
code, Mantaflow’s Python scene definition interface makes it ideal for integration and 
rapid testing with our algorithms. All of our Mantaflow experiments are conducted 
using two-dimensional (2D) simulations for speed and ease of visualization. 
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Fig. 4 Density field visualization from the small plumes Mantaflow simulation at one timestep. 
Darker colors signify higher density 


To run the simulations, we created a driver script that loads an experiment defi- 
nition file specifying the simulation setup, analysis partitions, simulation features to 
use for signature generation, as well as the signature, measure and decision functions 
to use for the analysis. Because the driver script also provides the simulation outer 
loop, it is trivial to run our analysis code alongside the simulation in situ. 

We designed several Mantaflow simulations to test our event detection approach 
at different scales; for this chapter, we focus on our small plumes simulation, which 
has four state variables (density, pressure, x-velocity and y-velocity) and features 
three steady turbulent plumes of buoyant fluid using a 64 x 256 grid and running for 
300 timesteps (Fig. 4). 

Since the goal for our I/O use case is to minimize the amount of data saved to 
disk while simultaneously maximizing the number of events captured, a fundamental 
challenge is defining a sensible ground truth: for any given simulation, there is no 
well-defined way to specify which parts of the simulation should be considered events 
of interest (and thus flagged by our framework for subsequent storage to disk). To 
address this, we opted to create our own explicit ground truth by injecting random 
“depth charge” anomalies into the simulation. To do so, we generate a random value 
for each simulation cell at each timestep. At any cell where the random value exceeds 
a threshold, the simulation density is increased by a substantial amount, and the cell is 
marked as anomalous using an additional simulation state variable. Thus, the depth 
charge anomalies occur at random timesteps and locations within the simulation 
domain, and the anomalies state variable keeps track of where they occur (Fig. 5). The 


Fig. 5 Per-cell ground truth for the small plumes simulation, at the same timestep as Fig. 4. The 
dark cells are anomalies, intentionally introduced by our “depth charges” 
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overall impact is to introduce physically-implausible aberrations into the simulation 
which surely qualify as events worthy of detection. Having created the anomalies 
ourselves, we can then evaluate the algorithm’s ability to flag them as events of 
interest. Note that, even with our explicitly injected anomalies, there is still ambiguity 
surrounding the question of which cells/partitions should be flagged as events: while 
the sudden onset of an anomaly is obviously an event worth noting, the threshold 
at which it should cease to be anomalous as it disperses is still arbitrary. Despite 
these shortcomings, our “depth charges” provide a quantitative way to compare 
performance among different algorithms tested using the framework. 

The behavior of our driver script is as follows. First, at each timestep, we use 
the Mantaflow API to run the solver for that step. Next, we extract the simulation 
state (density, pressure, velocities and anomaly ground truth) and save the raw data 
to disk. We then divide the simulation grid into 8 x 8 analysis partitions, since our 
framework requires multiple analysis partitions even when there is a single processor, 
as is the case for the serial Mantaflow simulations. Next, we compute the per-partition 
signatures. To support computing temporal measures and because the Mantaflow 
simulations are so small, we store every signature computed at every timestep, though 
we assume in practice that an HPC simulation would retain a smaller number of the 
most recent signatures. The set of per-analysis-partition signatures are then passed to 
the measure function to generate per-partition measures. Since the measure function 
has access to the signatures for every partition and every timestep, it can calculate a 
measure based on a comparison of signatures across every analysis partition (a spatial 
measure), a comparison of signatures across time for a single partition (a temporal 
measure), or a hybrid of the two. Because our Mantaflow experiments run on a single 
process, no communication is necessary, unlike the HPC experiments described in 
Sect. 3.4. We save the measures computed for each partition to disk for subsequent 
visualization. Finally, the measure values are passed to the decision function to be 
flagged as events or not, and those decisions are written to disk. 

Once the simulation is complete, we convert the simulation features, anomalies, 
measures and decisions stored on disk to color-mapped images, generating movies 
using the open source Imagecat (2022) library for compositing and Ffmpeg (2019) 
for encoding. The simulation movies provide a qualitative way to evaluate algorithm 
behaviors (Fig. 6). 

For quantitative comparisons, we used the decision data to generate several met- 
rics, including: (1) the percentage of simulation domain cells that are flagged as 
events by our framework, both per-timestep and for the simulation as a whole, and 
(2) the percentage of ground truth anomalous cells that are contained within parti- 
tions flagged as events, per-timestep and for the simulation as a whole. We refer to 
this latter metric as “recall”. 

Our early experiments were focused on identifying useful combinations of 
signature-measure-decision building blocks and developing intuition around their 
strengths and weaknesses. In this preliminary exploration, the percentage of simula- 
tion cells flagged as events ranges from 4.3% (excellent, a twenty-fold decrease in 
storage requirements) to 75% (likely not worth the effort), while our recall metric 
ranges from 35.4% (good) to 99.8% (excellent). One combination that produces con- 
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Fig. 6 Sample frame from a Mantaflow experiment movie: simulation state (a)—(d), per-analysis- 
partition measure (e) and decisions (j), simulation state masked by decisions (f)—(i) 


sistently good results for a wide range of parameters used is the quartile signature, 
dbscan spatial measure with Euclidean distance, and threshold decision function. 
Figure 7 plots the total percentage of flagged analysis partitions (lower is better) 
versus the anomaly recall (higher is better) for a set of experiments using this com- 
bination. The result is intentionally evocative of a receiver operating characteristic 
curve, emphasizing the trade-offs inherent in our desire to maximize the number of 
detected events while minimizing the total number of analysis partitions flagged for 
storage to disk. 

The dbscan measure used in these experiments has two main parameters: ¢, the 
threshold distance below which two signatures are considered “neighbors”; and N,,, 
the minimum number of neighboring signatures required to form a “neighborhood.” 
Once all of the neighborhoods in a collection of signatures are identified, any signa- 
tures not in a neighborhood are, by definition, flagged as interesting events. 

We tested combinations of £ and N, using grid search, varying £ values between 
0.1 and 1.0 and N, values between 1% and 50% of the total number of analysis 
partitions. At very low values of £, we rapidly achieved high recall, approaching 
100%. Values over 0.3 led to a rapid reduction in recall, dropping to around 8% for 
an £ of 1.0. Varying N, had much less effect, with most values below 40% having 
little effect on recall. We are encouraged that many parameter combinations produce 
results near the knee of the curve in Fig. 7, indicating that the algorithm is robust for 
a wide range of reasonable DBSCAN parameters. We chose ¢ = 0.2 and N, = 2% 
as the best parameters for this data, with results shown in Fig. 8. 
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Fig. 7 Flagged analysis partitions versus anomaly recall for the quartile-dbscan Mantaflow exper- 
iments 
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Fig.8 Flagged analysis partitions (top) versus recall (bottom) for quartile-dbscan Mantaflow exper- 
iment with £ = 0.2 and Np = 2% 


A Novel In Situ Machine Learning Framework ... 65 


Total checkpoint partitions: 10.0% 


il 


ET n wo 


100 


Checkpoint partitions (%) 
50 


Timestep 


Total anomalous cells: 1253 Recall: 10.0% 


20 30 


Anomalous cells (%) 
10 


(0 100 200 300 
Timestep 


Fig. 9 Saved analysis partitions (top) versus recall (bottom) for a simulation saving a checkpoint 
at every tenth timestep for the Mantaflow experiment 


In this case, an experimenter using the quartile-dbscan algorithm to decide which 
analysis partitions should be saved to disk would end up capturing 98.3% of the 
anomalies, while storing just 12.1% of the data. This is especially striking when we 
compare it to typical uniform temporal check-pointing of HPC simulation data: the 
experimenter who simply saves the entire simulation state at every tenth timestep as 
in Fig. 9 would use roughly the same amount of disk space (10% vs. 12.1%), while 
only capturing 10% of the interesting events! 

We performed temporal anomaly detection experiments using similar techniques. 
One comparable result used the minimax signature, the maxchange measure, and the 
threshold decision function, producing a recall of 96.3% while flagging only 24% of 
the data. 
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3.2 Detecting Physical Phenomena: Marine Ice Sheet 
Instability (MISI) 


While the in situ event detection framework described herein was originally devel- 
oped for the purpose of optimizing HPC simulation output, the proposed approach 
can also be used to detect physical phenomena present in HPC simulation data to 
further our understanding of the underlying physical processes. Here, we describe 
a specific instance of this use case, in which our framework facilitates the study of 
the hypothesized Marine Ice Sheet Instability (MISI) using simulation data from the 
MPAS-Albany Land Ice (MALI) model (Hoffman et al. 2018), the land ice com- 
ponent of the U.S. Department of Energy’s Energy Exascale Earth System Model 
(E3SM) (Leung et al. 2020). 

The Marine Ice Sheet Instability, first introduced in the 1970s (Weertman et al. 
1974; Thomas and Bentley 1978), hypothesizes that ice sheets grounded below sea- 
level may destabilize in a runaway fashion once the grounding line, the boundary 
between where the ice sheet is grounded and floating, reaches a point where the 
bedrock has a reverse slope gradient (Fig. 10) (Bamber et al. 2009). Once the bedrock 
beneath the grounding line is reverse sloping (i.e., it becomes deeper moving inland), 
ice thickness at the grounding line increases, leading to faster ice flow and greater ice 
flux divergence. As the flux at the grounding line increases, thinning at and upstream 
of the grounding line increases, causing the boundary between floating and grounded 
ice to move further inland. The result is a self-reinforcing mechanism that can cause 
rapid and irreversible ice sheet retreat and rapid sea level rise (Robel et al. 2019; 
Joughin and Alley 2019). Since the grounding line is often stabilized by the presence 
of an ice shelf (an extended region of floating ice that is dynamically connected to the 
grounded ice upstream of it), which has the effect of buttressing the ice and limiting 
ice flux at the grounding line, MISI is often triggered by the thinning or loss of ice 
shelves (Pattyn and Morlighem 2020). Satellite and modeling evidence suggests that 
MISI is underway in parts of the West (e.g., the Thwaites and Pine Island glacier) 
and East (e.g., the Totten glacier) Antarctic Ice Sheet (Robel et al. 2019; Joughin 
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Fig. 10 Marine Ice Sheet Instability triggered by an unstable grounding line retreat on retrograde 
bedrock slope. Figure adapted from Pattyn and Morlighem (2020) 
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and Alley 2019; Gardner et al. 2018; Young et al. 2011). While it is theoretically 
possible to identify locations prone to MISI by combining bedrock elevation data 
with information on retrograde bedrock slopes, this approach is not feasible since 
bedrock elevation data are limited. Moreover, the retrograde bed slope alone is likely 
not a sufficient proxy for MISI, as it does not take into account important features 
relevant to MISI, e.g., ice flow speed and ice flux. 

Our approach herein is to investigate the utility of our event detection frame- 
work in identifying the onset of MISI. Accordingly, we applied our event detection 
algorithms to two simulations datasets: (1) an idealized Antarctic BUttressing Model 
Intercomparison Project simulation (ABUMIP) (Sun et al. 2020), and (2) a predictive 
simulation of the Antarctic Ice Sheet with realistic climate forcing (Seroussi et al. 
2020). Following the naming convention introduced in Sun et al. (2020) and Seroussi 
et al. (2020), respectively, we refer to these datasets as abuk and exp05, respec- 
tively. Both simulations start with a realistic present-day initial condition obtained 
by performing an adjoint-based optimization using the MALI model (Perego et al. 
2014). They then simulate ice flow over Antarctica on a variable-resolution three- 
dimensional (3D) tetrahedral grid. The output from these simulations is subsequently 
mapped onto a 2D structured quadrilateral grid having a uniform resolution of 8 km 
(Fig. 11), for the purposes of analysis and comparison to other land ice models 
(Seroussi et al. 2020). In the abuk experiment, Antarctica’s ice shelves are removed 
instantaneously, and we perform a simulation in which the formation of new floating 
ice is prevented and no change in external atmospheric or oceanic forcing is applied. 
Although unrealistic, this scenario provides an extreme upper bound on sea-level con- 
tributions from Antarctica, and exhibits the full potential of MISI (Sun et al. 2020). 
As such, the abuk dataset is ideal for “calibrating” (i.e., determining a reasonable 
set of features and analysis partition sizes) and “validating” (i.e., ensuring that rea- 
sonable analysis partitions are flagged as interesting) our event detection framework 
before applying it to the more realistic exp05 scenario. The second experiment, 
exp05, is a standard test case in the ISMIP6 (Ice Sheet Model Intercomparison 
Project 6) experiments (Seroussi et al. 2020), and is meant to be a realistic predictive 
simulation of the Antarctic Ice Sheet state with atmospheric and oceanic forcing! 
under the RCP8.5 (Representative Concentration Pathway 8.5) (IPCC 2021) radia- 
tive forcing emissions scenario, which corresponds to the likely outcome if society 
does not make concerted efforts to cut greenhouse emissions during the remainder 
of the twenty-first century (Edwards et al. 2021). For initial prototyping, our event 
detection algorithms are applied to the datasets a posteriori; integration of these 
algorithms into the MALI code for true in situ analyses will be the subject of future 
work. For the abuk dataset, there are 51 solution snapshots, corresponding to a 500 
year simulation, with data saved every 10 years; for exp05, there are 86 solution 
snapshots, corresponding to an 85 year simulation, with data saved every year. 

Prior to presenting our main results, we discuss some nuances pertaining to the 
generation of analysis partitions for the land ice datasets considered herein. For both 
the abuk and exp05 datasets, the underlying computational domain onto which the 


l For details regarding these forcings, the reader is referred to Table 2 of Seroussi et al. (2020). 
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Fig. 11 “Full” 6088km x 6088 km domain for the exp05 dataset, with active cells shown in blue. 
Left panel shows a close-up of the Antarctic peninsula and the structured 8km quadrilateral mesh 
with which the problem is discretized 


MALI output is mapped is a 6088km x 6088 km square grid, discretized using 761 
quadrilateral elements in each coordinate direction (Fig. 11). To determine which 
cells within this computational domain are “active” (ice-covered), a time-dependent 
mask derived from the ice thickness was computed at each timestep based on an 
ice thickness criterion: only cells in which the ice thickness is greater than 10m are 
deemed “active” in each timestep. An important feature of masks derived in this 
way is that the masks, and hence the geometries on which the simulation proceeds, 
change in time: before solving for the ice sheet state at each time-step, inactive cells 
are removed from the mesh on which the simulation proceeds. While it would be 
possible to uniformly partition the “full” 761x761 element grid into P analysis 
partitions to use for our event detection workflow, such an approach would lead to an 
imbalanced set of partitions, in which many partitions would have few (or even zero) 
elements. Using an analysis partition set of this type could bias the event detection, 
especially when statistics-based signatures are employed. One approach to avoid this 
problem is to partition only the active grid, but this second approach also has several 
downsides: (1) its computational cost would likely preclude in situ analyses, and (2) 
with analysis partitions that change in time, it is not clear how to track temporal 
events using this methodology. To avoid these issues, we adopted a third approach, 
in which we created a mask (termed the “analysis partitioning mask”) that was only 
slightly larger than the maximum ice extent across all simulation times for a given 
dataset, and created a single partition of the geometry defined by this mask prior to 
performing event detection. In the present study, we consider two types of analysis 
partitioning masks: 


A Novel In Situ Machine Learning Framework ... 69 


700 
00 «o} 
so 
=) a 
> 400 > mo; 
a 5 
2 2 
2 Q 
5 w o 
100 100 
° 100 200 wo so 300 to no o 10 200 xo ao wo eo 190 
Global x ID Global x ID 
700 , . 700 4 
2 
% . > T be . . oe á . 
we] S oe ot hg sP as SERN m» a 
«Breen peg ase” ge, Se ERS â 7 
s [33 Ay SR A | g s ; 
?, 
ô T eps "3° o se Soe Bo) ô oe? á e ° oo 
5j tae DPE ead es oR tr 5” get a i T. 
. . @* aP $e -A -e ee i e, Nie . 4% Cd 2: we’ ates A 
3 Ea i z pel 2 | Be pa eS, Nea Sae. 
E . s 2 PN a 23° ge esc 
+ . bd $, . 
l- 7) E e A ee 
Bae e een “Dee ag oe, ° 
Te $ ° i a 
: . r 
. 
5 1o z0 Pa So so 3 1 z0 zo wo = 
Partition ID Partition ID 
(a) abuk experiment (b) exp05 experiment 


Fig. 12 Illustration of 500 analysis partitions (top panel) obtained using k-means clustering, and 
cell-counts for each analysis partition (bottom panel) for an active mesh with buffer (a) and the 
union of active meshes (b) analysis partitioning mask. The latter analysis partitioning mask was 
used for the abuk experiment, and the former was used for the exp05 experiment. Different colors 
in the top panel represent distinct partitions 


e Active mesh with buffer: a mask in which a buffer region is included around the 
maximum footprint of the underlying Antarctic geometry (Fig. 12a) is created; 

e Union of active meshes: a mask is created by performing a union of the active cells 
across all simulation times (Fig. 12b). 


Each approach to analysis partitioning mask creation has its pros and cons. The former 
approach is amenable to in situ analyses, but is likely to give rise to some analysis 
partitions with little to no elements. The latter approach minimizes the likelihood of 
empty/imbalanced analysis partitions, but would not be possible to generate in situ. 
Our preliminary numerical results, described below, suggest that both approaches to 
creating the analysis partitioning mask produce reasonable results for the datasets 
considered. 
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Having settled on an approach for dealing with the temporal variability of the 
active mesh in our land ice datasets, we now discuss the choice of partitioning scheme 
for generating the analysis partitions required by our event detection algorithm. We 
explored the use of several partitioning algorithms, including space-filling curve 
partitioning (e.g., Hilbert, Morton) (Sasidharan et al. 2015), quad-tree partitioning 
(Ansar et al. 2019), and k-means clustering (Hartigan and Wong 1979). Of these 
three approaches, k-means clustering produced the most balanced analysis partitions, 
shown in Fig. 12. These partitions are balanced in the sense that each partition has 
roughly the same number of cells, with the partition size appearing to be normally 
distributed around the target number of cells per partition. Our results below utilize the 
k-means partitioning algorithm implemented within Scikit-Learn (Pedregosa et al. 
2011), seeded with a random initialization. The reader can observe by examining the 
bottom panel of Fig. 12 that this partitioning scheme produces a partitioning with 
fairly balanced cell counts per analysis partition. Applying the space-filling curve 
and quad-tree partitioning approaches to our datasets in contrast gives rise to partition 
sizes ranging from a single cell to the maximum number of cells/partition requested 
(partitions not shown). As mentioned earlier, having analysis partitions of widely 
disparate sizes is particularly problematic for statistics-based signatures within our 
framework, since these signatures are highly dependent on the number of cells per 
partition. 

As discussed in Sun et al. (2020) and Seroussi et al. (2020), the abuk and exp05 
datasets contain a number of fields that can be used as features in our event detection 
workflow. In the preliminary study presented here, we considered the following four 
solution fields as features, denoted by F; fori = 1,..., 4: 


e F: the ice sheet thickness, 

e Fn: the norm of the ice velocity at the ice surface, 

e F3: the norm of the ice velocity at the ice base, 

e 4: the norm of the ice velocity averaged over the vertical extent of the ice. 


The ice sheet thickness is selected as a feature because it is a function of the bedrock 
geometry/topography; the ice velocity fields are used as features as fast-moving ice 
may correlate with the presence of MISI. In addition to employing the raw solution 
fields F; in our analysis, we also considered logarithms of these fields, denoted by 
log(¥;). We employed the quartile signature (Table 1), the dbscan measure with 
parameters £ = 0.3 and N, = 5% (Ester et al. 1996) (see Table 2 and Sect. 3.1 for 
a discussion of this measure and parameters) and the threshold decision (Table 4). 
In this initial proof-of-concept study, only spatial events of interest were considered. 
The threshold decision flagged partitions with a measure less than zero. The k- 
means clustering algorithm was used to generate 14,000 partitions, each having 
approximately 16 cells, for both the abuk and exp05 experiments. For the abuk 
dataset, we partitioned the active mesh with a buffer region around it (Fig. 12a), 
whereas for the exp05 dataset, we partitioned an active mesh consisting of the 
union of all active meshes during the simulation (Fig. 12b). 

Our main results are shown below, in Figs. 13, 14, 15 and 16, which plot the 
interesting analysis partitions in green, overlaying the ice thickness field feature used 
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Fig. 13 Event detection results for abuk experiment with the four raw fields F; fori = 1,...,4as 


features. Analysis partitions identified as interesting are plotted in green, overlaying the ice thickness 
field for several years. Our results show that ice contained within analysis partitions identified as 
interesting in one timestep will in general melt (become inactive) in the following timestep 


in the analysis. We emphasize that these results are preliminary and intended only 
to demonstrate the potential usefulness of the proposed framework in data-driven 
studies of land ice; scientific studies using our event detection framework will be the 
subject of future research. 


3.2.1 Results for the abuk Experiment 


We first apply our event detection framework to the abuk dataset, as this dataset is 
most likely to contain evidence of MISI. Figure 13 shows snapshots of the solution 
for the abuk dataset at several times, with a close-up in the vicinity of the Pine 
Island and Thwaites glaciers. Analysis partitions identified as interesting using our 
algorithm when employing the full set of fields {F;} for i = 1,...,4 as features 
are plotted in green, overlaying the ice thickness field for several years. The reader 
can observe by inspecting this figure that cells comprising the analysis partitions 
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Fig. 14 Event detection results for the abuk experiment with F1 and log(F4) as features. Analysis 
partitions identified as interesting are plotted in green, overlaying the ice thickness field for year 33 
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Fig. 15 Event detection results for exp05 experiment with the four raw fields F; fori = 1,...,4 


as features. Analysis partitions identified as interesting are plotted in green, overlaying the ice 
thickness field for year 33. The grounding line is shown with a black contour. Our event detection 
framework identifies the fastest moving areas along Antarctica’s coast (ice shelves, outlet glaciers), 
where MISI is more likely to initiate 
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Fig. 16 Event detection results for the exp05 experiment with F and log(F4) as features. Analysis 
partitions identified as interesting are plotted in green, overlaying the ice thickness field for year 
33. The grounding line is shown with a black contour 


identified as interesting in one timestep subsequently become inactive (based on 
previously-described active mask criterion) in the following timestep. In other words, 
the ice that is flagged by our algorithm melts shortly after it is flagged, a behavior 
consistent with MISI. 

Next, we perform event detection using a reduced set of features, namely F1 and 
log(¥4). Figure 14 plots the anomalies identified by our framework in year 33 of the 
simulation, again in green and overlaying the norm of the ice thickness field. It is 
interesting to observe that significantly more interesting partitions are identified with 
the new set of features. This is not surprising, as applying a logarithm transform of an 
analysis feature when using the dbscan measure has the effect of emphasizing small 
differences in small-magnitude values. An additional noteworthy observation is that, 
with the new set of features, not all of the interesting analysis partitions identified by 
our algorithm are at or near the grounding line. In particular, several of the flagged 
locations are located a large distance inland. These locations appear to be regions 
where the ice retreats the fastest, and should be inspected further in search of MISI. 
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3.2.2 Results for the exp05 Experiment 


Having obtained plausible results for the abuk experiment, we now turn our atten- 
tion to the more realistic exp05 case. Figure 15 plots results for the exp05 dataset 
corresponding to year 33 in the simulation, with analysis partitions identified as inter- 
esting plotted in green and the grounding line (the boundary between where the ice 
sheet is grounded and floating) plotted with a black contour. From this figure, one 
can see that our event detection framework identifies the fastest moving areas along 
Antarctica’s coast (the ice shelves and outlet glaciers) as events. These are locations 
where MISI is more likely to originate. In particular, the following glaciers are iden- 
tified as containing events of interest: Pine Island, Thwaites, Totten, Byrd, Recovery 
and Lambert (see Fig. 15). Observational evidence suggests MISI is underway at 
Thwaites, Pine Island and Totten glaciers (Robel et al. 2019; Joughin and Alley 
2019; Gardner et al. 2018; Young et al. 2011). The other regions identified as inter- 
esting by our framework are worth taking a closer look at — in both model simulations 
and observational datasets — in search of MISI (Hoffman et al. 2022). 

The most intriguing result is shown in Fig. 16, which plots the interesting analysis 
partitions for the exp05 dataset with the new set of features, again in green. The 
reader can observe that our algorithm flags several regions located inland relative to 
the grounding line (shown by a black contour). Additionally, the analysis partitions 
identified as interesting on and near Antarctica’s ice shelves closely match the loca- 
tions that have a significant impact on grounding line flux identified by Reese et al. 
(2018). While a more rigorous study is required for validating this result, the fact 
that there is corroboration with previously published results appears promising. A 
more rigorous investigation, towards understanding the physical mechanisms driving 
the events identified by our framework, will be the subject of future work. Future 
work will also explore the use of alternate features in the event detection workflow 
(including lateral buttressing in shear zones, basal friction, and flux fields, such as the 
ice velocity flux divergence), as well as alternate signatures and measures, includ- 
ing temporal measures (Table 3). We additionally plan to apply our methodology to 
higher-resolution datasets (e.g., 3D unstructured datasets produced by running the 
MALI model/code (Hoffman et al. 2018)) and to land ice datasets expected to exhibit 
stochastic behavior, e.g., simulations that include parameterizations of physical pro- 
cesses for ice calving and subglacial hydrology. 

Finally, it is worth remarking that interesting events or anomalous behaviors iden- 
tified in land ice simulations using the proposed framework could be relevant for 
scientists even if they are not an indication of MISI. In this context, an analysis 
partition flagged by our framework could be indicative of something incorrect in the 
data or underlying land ice model (e.g., a software flaw or missing physics), or of 
interesting physical phenomena other than MISI. 
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3.3 Reduced Order Modeling: Sample Mesh Generation 
for Hyper-Reduction 


To highlight the breadth of application spaces that can benefit from the proposed event 
detection algorithms, we discuss a fundamentally new use case for our framework 
within the field of projection-based model reduction. 

Projection-based reduced order modeling is a promising strategy for reducing the 
computational cost of high-fidelity HPC simulations, which are often too expen- 
sive for use in a design or analysis setting (e.g., optimization, UQ). Reduced order 
models (ROMs) have two key features: they are constructed to retain the essential 
physics and dynamics of their corresponding full order models (FOMs) and they 
incur a substantially lower (in some cases by orders of magnitude) computational 
cost. In projection-based model reduction, the state variables are approximated within 
a low-dimensional subspace, which is typically obtained offline by first applying data 
compression on a set of snapshots collected from a high-fidelity simulation or phys- 
ical experiment. A typical projection-based ROM workflow consists of three steps, 
depicted in Fig. 17 and described succinctly below. In this figure, and the discussion 
that follows, it is assumed that the FOM is given by the following nonlinear ordinary 


differential equation (ODE): 
dw 
— =f wt, p), (1) 


dt 
where w denotes the solution vector ¢ denotes time, w is a vector of parameters 
Note that (1) is very generic: an ODE of the form (1) is obtained, for example, by 
semi-discretizing the partial differential equations (PDEs) defining the FOM in space 
using a numerical method, such as the finite element or the finite volume method. 
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Fig. 17 Illustration of a projection-based model reduction workflow using the POD/LSPG method 
with hyper-reduction of a full-order model given by the ODE w = f (w; t, m). In this figure, (-) 
denotes “function of” rather than multiplication. The matrices and vectors appearing in this figure 
have the following dimensions: X € RAK; e RYM; w, w, fr” e RY; 00 e RY; A e RI; 
p € RŁ, where L € N is the number of parameters 


76 T. M. Shead et al. 


Step 1. Acquisition of high-fidelity snapshot data. The first step in a typical projection- 
based model reduction workflow is the acquisition of a set of K instantaneous snap- 
shots of a numerical solution field. Typically snapshots are collected for K values of 
a parameter of interest (see Fig. 17), at K different times, or both. 


Step 2. Learning a reduced basis. Given an ensemble of high-fidelity snapshots 
denoted by {w”}*_,, the next step is the calculation of a basis of reduced dimension 
M & N, where N denotes the number of degrees of freedom (dofs) in the FOM. 
There are numerous approaches in the literature for computing a low-dimensional 
subspace, but we restrict the discussion here to the Proper Orthogonal Decomposition 
(POD) method (Sirovich 1987; Holmes et al. 1996) for calculating reduced bases, due 
to its simplicity and prevalence in practice. Mathematically, POD is closely related to 
Principal Component Analysis (PCA), and seeks an M-dimensional subspace (with 
M «& K) spanned by a set of modes {po} , such that the difference between the 
snapshot ensemble {w”}_, and the projection of this ensemble onto the reduced sub- 
space is minimized on average. It is a well-known result that the solution to the POD 
optimization problem reduces to a singular value decomposition problem involving 
the snapshot matrix X, as shown in Fig. 17; specifically, the modes {@;}™, are the 
M left singular vectors corresponding to the M largest singular values of X. The 
interested reader is referred to Holmes et al. (1996), Kunisch and Volkwein (2002), 
Rathinam and Petzold (2003) for details. 


Step 3. Projection-based reduction. The final step is the actual reduction, obtained 
by projecting the equations defining the FOM onto the reduced basis, denoted by 
® :=[¢,,...,¢y] E RY*”. Common projection methods are Galerkin projection 
and Least-Squares Petrov-Galerkin (LSPG) projection; herein, we focus on the latter 
approach, as it has been shown to exhibit better stability properties, especially for 
fluid systems (Carlberg et al. 2017). This approach operates on a FOM that has been 
fully discretized in both space and time, which can be written as: 


r"(w"; p) = 0, (2) 


where r denotes the residual, and the super-script n denotes the time index, with 
n=1,...,Nr,sothat w” := w(t”), where t” is the nth timestep within a simulation 
based on (2). The high-fidelity solution w(t) is approximated as a linear combination 
of the reduced basis modes: 


w(t) © wult) = w(t), (3) 


where w(t) € R”, with M < N. Given this definition, in the LSPG approach, solv- 
ing for the ROM solution amounts to solving the following least-squares optimization 
problem: 


w" = arg min ||r”(®y; WI, (4) 
yeR” 
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for n =1,..., Nr and w” := #(t"). Equation (4) can be solved using the Gauss- 
Newton approach following the method of Carlberg et al. (2013). Unfortunately, the 
approach described thus far is inefficient for nonlinear problems, as the solution of 
the ROM problem (4) requires algebraic operations that scale with N, the dimen- 
sion of the original FOM. This problem can be circumvented through the use of 
hyper-reduction, the basic idea of which is to compute the residual at some small 
number of points q with q < N, encapsulated in a “sampling matrix” A computed 
as a pre-processing step of the model reduction procedure using available snapshot 
data. The set of q points is typically referred to as the “sample mesh”, and a variety 
of quasi-optimal approaches aimed to minimize the representation error of a given 
nonlinear function appearing in the FOM residual exist—examples include the (dis- 
crete) empirical interpolation method (D)EIM (Barrault et al. 2004; Chaturantabut 
and Sorensen 2010), “best points” interpolation (Nguyen et al. 2008; Nguyen and 
Peraire 2008), collocation (LeGresley 2006), gappy POD (Everson and Sirovich 
1995), and p—sampling (Drmac and Gugercin 2016). These approaches approximate 
the solution to the NP-hard optimization problem of minimizing the representation of 
a nonlinear residual using different greedy approaches. Typically, as one may expect 
based on intuition, the sample mesh points returned by these algorithms are clustered 
in regions where the simulated solution exhibits “interesting” behavior/features, e.g., 
shocks, vortices, etc. (see e.g., Fig. 18). With the introduction of hyper-reduction, 
the LSPG optimization problem takes the form 


w" = arg min ||Ar”(®y; )]|3. (5) 
veRY 


As illustrated in Fig. 17, the matrix A € R1*™ is sparse, and has the effect of “sub- 
selecting” the residual r at some small number of points g, corresponding to the 
non-zero columns of A. 

Current state-of-the-art methods employ a single static sample mesh computed 
offline, and use the same sample mesh for hyper-reduction for all the timesteps at 
which the ROM solution is computed. It has been observed that, for certain appli- 
cations, sample meshes computed using standard hyper-reduction methods (gappy 
POD (Everson and Sirovich 1995), p—sampling (Drmac and Gugercin 2016)) are 
inadequate; in particular, they yield ROMs that are less accurate than ROMs con- 
structed with a random sample mesh that knows nothing about the problem dynamics 
(Blonigan et al. 2021). 

We hypothesize herein that it may be possible to improve the accuracy of hyper- 
reduced ROMs through the creation of a set of evolving sample meshes, calculated 
using the unique features present in the solution at each time, or within time windows. 
The parallel to AMR (Berger and Oliger 1984) should be clear. To explore this idea, 
we perform a preliminary study in which we use our event detection framework to 
calculate dynamically-changing sample meshes, with readily-available snapshots of 
the FOM solution and the solution residual as features. In this approach, we use the 
analysis partitions flagged as anomalous to define the sample mesh points. 
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Fig. 18 Computational domain (top) and representative sample mesh points shown in red (bottom) 
for the 2D open cavity geometry. The sample mesh was obtained using the p—sampling approach 
(Drmac and Gugercin 2016) 


Below, we present and describe some preliminary results exploring the viability 
of our proposed approach to dynamic sample mesh generation in the context of a 
problem involving a 2D viscous compressible flow with a Reynolds number of 10,000 
over an open cavity geometry, pictured in Fig. 18. To generate a FOM of the form 
(2), the governing compressible Navier-Stokes equations are discretized in space 
using a third order Discontinuous Galerkin (DG) method with 600 x 240 elements 
in the streamwise and wall-normal direction, respectively, and in time with a Crank- 
Nicolson time-stepper having a timestep of 5 x 107°. The mesh for this geometry is 
obtained by discretizing a rectangular region with a uniform 600 x 240 mesh, and 
transforming it to fit the cavity geometry of interest. More details pertaining to the 
high-fidelity discretization can be found in Parish and Carlberg (2021) and are not 
repeated here for the sake of brevity. The free-stream Mach number is unity, which 
causes a shock to form in the problem solution (see Fig. 19, top row). A POD basis is 
constructed from 1000 snapshots of the high-fidelity solution. These same snapshots 
are employed to calculate a sample mesh having 1000 points using the p—sampling 
approach. This sample mesh is shown in Fig. 18. 

The objective of the present section is to explore the viability of constructing 
dynamic sample meshes using our event detection framework. The natural choice of 
features to use for this task are the solution (Fig. 19, top row) and the solution residual 
(Fig. 19, second row). The former is a vector of the four primary conserved variables, 
p, pu, pv and pe, where p is the fluid density, u and v are the fluid velocities, and e 


Ke 
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(a) Snapshot 100 (b) Snapshot 500 Snapshot 928 


Fig. 19 Plots of the density solution (top row), the density residual (second row) and dynamic 
sample meshes calculated using our event detection framework (rows 3-5) for the 2D compressible 
cavity flow problem at the times of snapshots 100 (a), 500 (b) and 928 (c). In rows 3-5, sample 
mesh points are shown in yellow. The sample meshes in rows 4 and 5 are obtained by randomly 
selecting one-fourth and one-sixteenth of the points, respectively, within each interesting analysis 
partition shown in the third row 
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is the fluid energy; the latter is the residual of the governing PDEs for each of these 
variables, which contains the nonlinear terms in the governing partial differential 
equations, the compressible Navier-Stokes equations. For the purpose of the event 
detection, we partition our geometry into 150 x 60 analysis partitions, each having 
4 x 4cells. In this preliminary study, we consider the quartile signature (Table 1), the 
dbscan measure (Table 2) with € = 0.3 and N, = 1% (Table 2) and the threshold 
decision with a threshold of 0.5 (Table 4). The sample meshes returned by this 
approach are plotted in Fig. 19. Row 3 of this figure shows in yellow the interesting 
partitions, which define a dynamic sample mesh, identified by our event detection 
framework at the time of snapshots 100, 500, and 928, respectively. The reader can 
observe that the dynamic sample meshes are changing in time. Additionally, the 
sample mesh points are in general concentrated within the cavity and in the vicinity 
of the shock that is seen in the density solutions (Fig. 19, top row). 

The reader can observe by comparing the third row of Fig. 19 with Fig. 18 that the 
sample meshes identified by our event detection framework are qualitatively similar 
to the static sample mesh obtained using the p—sampling algorithm. In an effort to 
measure the quality of the dynamic sample meshes calculated using our framework, 
we calculate the following quantity given a sample mesh represented by the matrix 
A: 


W—Ws 
"e II sha 6) 
I|w||2 
where w, := ®w, and 
w, = arg min ||AX — A@&||3. (7) 
weR”Y 


In this context, x, is the optimal state one can reconstruct given knowledge of only 
the FOM state and the sample mesh. The quantity (6) has the advantage that it is 
computable offline (without running the full model reduction workflow). 

Figure 20a plots the quantity € from (6) for the fluid density solution as a function 
of time for the dynamic sample meshes obtained using our approach and for the static 
sample mesh obtained using p—sampling. As noted earlier, this comparison is not 
entirely consistent, since our dynamic sample meshes contain far more points than 
the static sample mesh we are comparing to (see Fig. 20b). A very simple strategy 
for reducing the sizes of our dynamic samples is to randomly drop a fixed fraction 
of the sample mesh points within each analysis partition flagged by our approach. 
Figure 19 shows the resulting sample meshes when one-quarter (fourth row) and 
one-sixteenth (fifth row) of the sample mesh points are kept within each interesting 
analysis partition. By randomly selecting just one sample mesh point within each 
interesting analysis partition (which corresponds to the one-sixteenth sub-sampling 
shown in Fig. 19, the fifth row), itis possible to reduce the sizes of our dynamic sample 
meshes to be on the order of the static sample mesh obtained through p—sampling 
(Fig. 20b). Remarkably, as the reader can see from examining Fig. 20a, reducing the 
number of sample mesh points in this way does not increase the error (6). While the 
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Fig. 20 Comparison of errors € in the density solution and the sample mesh size as a function of 
time for the cavity flow problem for sample meshes calculated using our event detection framework 
versus p—sampling 


fact that the error (6) for the dynamic sample meshes obtained using our approach are 
roughly comparable to the errors of the p—sampling sample mesh may seem negative, 
itis actually encouraging, given that our approach is unsupervised and not based on an 
underlying optimization problem. Future work will focus on improving the sample 
meshes calculated using our approach, e.g., by bringing in ideas from traditional 
sample mesh approaches, which are based on minimizing the approximation error 
on a given sample mesh. Additionally, we plan to deploy our approach on test cases 
with more sophisticated dynamics, for which a dynamic sample mesh procedure will 
likely yield a greater benefit (e.g., problems with moving shocks). Future work will 
also include the design of signature-measure pairs that can guarantee that a given 
number of analysis partitions are selected at any given timestep; in order to achieve 
this, it is necessary to use a non-boolean measure. 


3.4 HPC Experiments 


As discussed in Sect. 1, an important requirement for an in situ event detection 
framework is that it be scalable and communication-minimizing. In this section, we 
verify the scalability of our framework in an HPC application utilizing MPI (Message 
Passing Interface Forum 1994) for coordinating the parallel communication and 
computation. In order to perform this study, we embedded a Python interpreter in 
the S3D combustion simulation code (Chen et al. 2009) which is written in Fortran 
90. References to the raw data from the Fortran side were passed to the Python 
framework at each timestep, without duplication. The mpi4py package (Dalcin et al. 
2005) was used to access the MPI environment from Python and perform collective 
communication between processors. 
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We ran our experiment using the Cori Cray XC40 machine at NERSC. The 
simulation represented conditions of a homogeneous charge compression ignition 
(HCCI) combustion of ethanol-air mixture at conditions typical of internal combus- 
tion engines. The mixture undergoes compression heating and auto-ignition kernels 
appear locally in small pockets, as shown in Fig. 21, that lead to the eventual com- 
bustion of the entire mixture. The goal for an event detection algorithm in this case 
is to identify the partitions where the auto-ignition kernels appear. 

We decomposed the 2D simulation domain into 1024 partitions, with one partition 
per MPI rank, and processed 626 snapshots with 3136 grid points per partition and 
33 features at each grid point. The event detection involved the following steps: 


e global min-max pre-processing utilizing two MPI all-reduces, one each for the 
per-feature global min and max, over a vector of a size equal to the number of 
features. 

e mean signature on the data locally on each partition (no MPI communication 
involved). 

e msd measure, which involves computing a global mean of signatures and requires 
an MPI all-reduce of a vector of size equal to the number of features. 


In a previous work (Konduri et al. 2018), we used this simulation as a motivation for 
designing a new signature — feature moment metric (fmm) — which represents the 
distribution of a given joint statistical moment (e.g., Kurtosis) across all the features. 
Here our focus is only on demonstrating the parallel performance of the framework 
and hence we use the simpler mean signature. 

The execution times for the solver and the event detection components were 
recorded for the simulation. The solver execution time was 0.126s for every sim- 
ulation timestep. The event detection execution time ranged from a minimum of 
0.012s per timestep to a maximum of 2.28s, with an average of 0.2s. Because the 


Fig. 21 Contour plot of heat 
release rate (J/m>/s) at an 
early instant of the HCCI 
combustion simulation. A 

12 x 12 partitioning of the 
domain is shown with white 
lines, and auto-ignition 
kernels are denoted by 
regions of high (red) heat 
release rate 
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workflow was identical from one timestep to the next, the large variation in the times 
can be attributed to system noise. While not negligible, the average event detection 
time was on the same order of magnitude as the solver, and thus within the realm of 
practicality, depending on the application. Encouragingly, the minimum time was an 
order of magnitude smaller than simulation time, suggesting that — under conditions 
free of system noise — the event detection could be performed in a fraction of the 
simulation time. 

Note too that we used Python in situ to run this experiment for expediency, and 
that the analysis time could be drastically reduced by porting our framework to 
compiled code. Finally, analysis overhead could be further reduced for large-scale 
applications by reducing the number of event detection checks. Performing the event 
detection at, for instance, every Nth timestep would be an effective compromise 
between traditional check-pointing and fine-grained event detection, reducing the 
event detection load to a negligible portion of the runtime. 


4 Conclusion 


This work represents a first step in the development of event detection algorithms 
that can automatically identify events of interest in situ. Specifically, we presented a 
signatures-measures-decisions framework for the development of in situ HPC event 
detection algorithms. This framework is a useful decomposition that supports gen- 
eralizability, unsupervised detection, low communication requirements and online 
processing. We have developed components under this framework which enable the 
use of standard event detection algorithms under the aforementioned constraints, in 
addition to entirely new combinations. We illustrated how example algorithms made 
from these components can optimize I/O while running an HPC simulation, leading 
to the capture of many more interesting events than typical uniform check-pointing. 
We highlighted two additional use cases for the proposed framework: detecting inter- 
esting events in HPC simulations (the Marine Ice Sheet Instability in land ice data), 
and identifying optimal space-time subregions for the hyper-reduction step of a typi- 
cal projection-based model reduction workflow. Finally, we demonstrated, in a study 
using HPC and MPI, that in situ event detection overhead can be on the order of 
magnitude of the simulation, and performance can be improved further with minor 
adjustments. 

This work enables future research in several areas, such as the question of what 
should constitute an “interesting” event for a given simulation, or, ideally, how 
to define “interesting” for any given simulation. Apart from detecting events the 
proposed approaches can also identify numerical anomalies, which can help with 
debugging and interpretation of simulation results. In addition, it is possible that this 
framework can be used to classify events either in situ or as a post-processing tech- 
nique by analyzing the signatures themselves; the signatures distill information from 
a large number of samples and are less expensive to analyze. Finally, we hope that 
experiments done using this framework will inspire HPC simulation code developers 
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to incorporate these capabilities into native code, allowing for even more efficient 
in situ event detection. 
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Abstract The accurate modelling of the unresolved stress tensor is particularly 
important for Large Eddy Simulations (LES) of turbulent flows. This term affects 
the transfer of energy from the largest to the smallest scales and vice versa, thus 
controlling the evolution of the flow field-in reacting flows, the flow field transports 
scalar fields such as mass fractions and temperature both of which control the species 
production and destruction rates. A large number of models have been developed in 
past years for the stress tensor in incompressible and non-reacting flows. A common 
characteristic of the majority of the classical models is that simplifying assumptions 
are typically involved in their derivation which limits their predictive ability. At the 
same time, various tunable parameters appear in the relevant closures whose value 
depends on the flow geometry/configuration/spatial location, and which require care- 
ful regularisation. Data-driven methods for the stress tensor is an emerging alternative 
modelling approach which may help to circumvent the above issues, and in recent 
studies several such models were developed and evaluated. This chapter discusses 
the modelling problem, presents some of the most popular algebraic models, and 
reviews some recent advances on data-driven methods. 
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1 Introduction 


LES is a powerful tool for simulating a wide range of flows including turbulent and 
reacting flows. Although LES is more expensive than Reynolds Averaged Navier 
Stokes (RANS) simulations, with the rapid advances of fast and efficient computer 
hardware and scalable but also readily available software, LES is increasingly being 
used in a wide range of industries (aerospace, automotive, energy, chemical) for 
modelling fluid flows in complex and often realistic-size geometries (Gicquel et al. 
2012; Pitsch 2006). In comparison to Direct Numerical Simulations (DNS) where all 
length and time-scales are resolved, LES reduces the computational load substantially 
by resolving only the largest scales. 

LES comes in two main flavours: implicit and explicit (Gicquel et al. 2012; Sagaut 
2001). In implicit LES, the filtering is essentially done through the numerical scheme 
whereby the goal is to obtain steady or at least bounded solutions for a given mesh 
size/time-step. In explicit LES, a spatial filter having a width A is applied to the 
governing equations, and unresolved terms appearing in the resulting equation set are 
modelled explicitly. This is done either by developing suitable algebraic functions 
involving the resolved variables on the mesh, and/or by developing and solving 
suitable transport equations. In the majority of classic approaches the mesh spacing 
h to filter ratio h/A = 1 but this need not necessarily be the case as we discuss later 
on. Each of these two approaches has its merits and drawbacks and in this chapter we 
focus on explicit LES which solves the filtered equations. The filtered compressible 
momentum equation reads, 

pu; + Əpüiü j — op n ƏT; Iti; l (1) 
ot Ox j Ox i Ox j Ox j 


where the overbar denotes spatial filtering using a suitable filter i.e. 


(x,t) = f G(x — x; AJ Rdx', (2) 


where G is the LES filter and ¢ the quantity being filtered. Note that ~ denotes 
Favre-filtering i.e. 6 = p¢/. The resolved and unresolved stress tensors Tj and Tij 
are given by, 


Ou; Ou; 2 ou 
aul + 1) E (3) 


and 


Tij = Pu; — ij), (4) 
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respectively. The resolved stress tensor is typically closed using the gradients of 
the filtered velocity components (hence called resolved, not because it is actually 
resolved but because the approximation below is such a good one), 


; _ fou; ou; 2 du _ [= 1. 
Ti ~ a( p ô ‘) =2u (5, = 345) i (5) 


xj dx; 3 "ax 


where 


Pad di; | dit) © 
Y“ 28x; ax)’ 


is the (resolved) rate of strain tensor. Clearly t;j is an unclosed term and requires mod- 
elling in order to produce a closed equation set. This term is very important since it 
determines the dissipation/back-scatter of kinetic energy (Sagaut 2001)-multiplying 
Eq. 1 with u; and summing it is straightforward to show that the contribution of 
the unresolved stress tensor to the resolved total kinetic energy e, = 1/2u;u; is 
—uj0T;;/0X;. 

A large number of different models have been developed in the literature through- 
out the years for t;; aimed mainly at incompressible and non-reacting flows (Mene- 
veau and Katz 2000). In the classic modelling approach, the stress tensor is modelled 
by developing suitable algebraic functions of the resolved quantities. In incompress- 
ible flows for instance, these include the filtered velocity components u; as well as 
any other derived quantities such as their gradients and/or functions of their gradi- 
ents, higher-order filtered values of the aforementioned quantities etc. The majority 
of these models are relatively straightforward to implement while the computational 
cost depends on the formulation: the dynamic evaluation of model parameters can be 
substantially more expensive than the static approach (where a constant value for a 
certain parameter is assumed). A common characteristic of all of the aforementioned 
models is that they usually involve some simplifying assumption in their development 
which may or may not be valid for conditions other than those originally developed 
for. For example, the Boussinesq assumption is a rather strong one (Schmitt 2007). 
Previous theoretical as well as experimental work showed that this assumption is 
invalid both for non-reacting (Tao et al. 2000, 2002) and reacting flows (Klein et al. 
2015; Pfandler et al. 2010). Another issue with classic algebraic models is that they 
involve tunable parameters whose spatio-temporal variation depends on the flow 
regime and/or reaction mode. As a result, a single universal method for accurate 
parameterisation/regularisation of the models’ constants is difficult to obtain. 

Despite the aforementioned issues, the standard approach in reacting LES is 
to employ models originally developed and validated for incompressible and non- 
reacting flows. Reacting flows, however, bring additional challenges. The heat release 
causes large variations in density, temperature, velocity, and viscosity across the 
flame-front. All of these quantities affect the modelling of the stress tensor. Mod- 
els developed for non-reacting and incompressible flows do not account for such 
effects. For instance, it was shown in Klein et al. (2015) as well as in previous 
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theoretical and experimental studies (Bray et al. 1981; Chomiak and Nisbet 1995) 
that even for simple flow configurations such as freely-propagating premixed flames 
classic models are inadequate. In particular, it was shown (Klein et al. 2015) that 
counter-gradient transport also occurs for the components of the stress tensor, and 
as a result classic static gradient-type models cannot capture counter-gradient trans- 
port. Even dynamic models where the sign of the dynamic parameter can in principle 
change, fail to capture counter-gradient transport (Klein et al. 2015). In addition, it 
was shown in Klein et al. (2015) that the standard averaging procedure for regular- 
ising the dynamic parameters e.g. Cp in the Smagorinsky model is not suitable for 
reacting flows. The behaviour and performance of these models for more demanding 
configurations such as shear-induced flows with a larger spatial in-homogeneity is 
unclear, and the deficiencies of such models can only be unveiled through further 
investigation using both a priori as well as a posteriori studies. All of these issues 
essentially limit the predictive ability of LES to conditions where the models for the 
unresolved terms are known to perform well. 

In light of the aforementioned long-standing issues, in the past few years a wide 
range of alternative non-classic modelling strategies have been proposed and eval- 
uated (Domingo et al. 2020) including machine-learning which has the potential to 
circumvent such issues. Data-driven methods which include a wide range of network 
architectures have been widely used to solve classification and regression problems 
in image recognition (Krizhevsky et al. 2012), text translation (Sutskever et al. 2014), 
decision making (Mnih et al. 2015; Silver et al. 2016), gene profiling (Khan et al. 
2001) etc. by directly exploiting the abundance of information contained within very 
large data sets. In the field of fluid mechanics databases are also quite substantial- 
DNS databases of non-reacting flows for instance are of the order of petabytes (Kanov 
et al. 2015). In reacting flows, simulations using DNS with detailed chemistry and 
multi-step reduced chemistry are slowly yet steadily becoming more common (Asp- 
den et al. 2016; Minamoto et al. 2011; Nikolaou and Swaminathan 2014, 2015; 
Wang et al. 2017) while numerical solvers are being developed for DNS aimed at 
the exascale (Treichler et al. 2017) and exploiting hybrid architectures (Perez et al. 
2018). As a result, the application of machine-learning techniques using data from 
such high-fidelity simulations for modelling purposes in LES appears to be a timely 
one. 

In the text which follows we present in Sect. 2 some fundamental/popular models 
in the literature which have been the subject of recent and extensive testing in reacting 
flows (Nikolaou et al. 2019, 2021). In Sect. 3 another emerging approach namely 
deconvolution is discussed, and in Sect. 4 a review of the main approaches used 
for machine-learning is given. The main challenges and caveats associated with 
machine-learning methods are summarised in Sect. 6. 
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2 Classic Stress Tensor Models 


2.1 Smagorinsky 


The Smagorinsky model is an eddy-diffusivity type of model originally developed 
for application to atmospheric flows (Moin et al. 1991; Smagorinsky 1963). The 
stress tensor closure reads, 


1 -fa ls% 
Tij — 3 fis Tkk = —2pv, g E zusu) ? 7) 


where the turbulent viscosity v, is modelled using v, = (C pA ls. | with |S |= 


25; Sij- In the original (static) version Cp is replaced by ce with Cs ~ 0.2. It 
is a very popular model as it is relatively straightforward to implement and computa- 
tionally efficient. However from a theoretical point of view there are some key issues 
to highlight. Firstly, it is a purely dissipative model whereas a reverse flow of energy 
(backscatter) is known to exist from the smaller scales to the larger scales both in 2D 
flows as shown by Fjortof (1953) and in 3D flows (Domaradzki et al. 1993; Kerr et al. 
1996; Piomelli et al. 1991). In addition, the assumption of the unresolved stress tensor 
being aligned to the resolved rate of strain tensor is a rather strong one as shown by 
previous experimental and numerical studies (Tao et al. 2000, 2002). Another issue, 
is that the model predictions are sensitive to the value of Cs (Smagorinsky constant) 
which depends on the flow regime (Deardoff 1970; Lilly 1966), but also on the filter 
width and mesh spacing (Mason and Callen 1986). 

These limitations soon became apparent with the static Smagorinsky model per- 
forming relatively well for homogeneous and isotropic decaying turbulence but 
poorly for shear-dominated flows such as turbulent channel flow. In such config- 
urations the value Cs ~ 0.2 in the near-wall region was found to be excessive and 
a reduction was required to obtain the correct (lower) dissipation. This led to the 
development of a dynamic version by Germano et al. (1991) where Cp was no longer 
constant but calculated dynamically (during the simulation) from the resolved flow 
variables. The dynamic Smagorinsky model showed considerable improvement over 
its static version, particularly in shear flows (Germano et al. 1991), and was later 
adapted to compressible flows by Moin et al. (1991) whereby Cp is typically calcu- 
lated using the least-squares approach (Lilly 1992; Salvetti 1994), 


(—(Lij — 56); Lia) Miy) 
(2A?Mi;Mij) 


Cp = , (8) 


where <> indicates a suitable averaging (regularisation) procedure, and ^ indicates 
test-filtering with a filter A. The ratio y = A/A is typically taken to equal 2. The 
Leonard term L;; is given by, 
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Lij = pijit; — (pit;)(Pit;)/p, (9) 
and 
223 [4 1 2 Jee T ea 
Mij = a p|S| Sij — 375i; Ske | — PISIS — 38u PISIS J (10) 


An important point to note is that the Smagorinsky model does not apply for the 
normal (isotropic) components of the stress tensor. Typically, the static Yoshizawa 
approximation is used to explicitly model tą (Yoshizawa 1986) as follows, 


Tk = 2PC,A?|S|’, (11) 


where in the static version the model parameter C; is a constant. Yoshizawa suggested 
a value of ~ 0.089 (Yoshizawa 1986), however values ranging from 0.0025—0.009 
were reported while dynamically evaluating C; in the study of Moin et al. (1991). 
In the dynamic version, C; is calculated using (Moin et al. 1991), 


< Liz > 
Ci = —— (12) 
<P> 


where L;,; is the trace of the Leonard term, and the term P is given by, 
am 2 2 2 pl “a 
P=2(ŝà IS] — A*p|S| ) 


From the equations just presented above it becomes apparent that even for a sim- 
ple model like Smagorinsky the evaluation can be rather complicated: it involves 
the calculation of tensor variables which include gradients, and filtering as well as 
test-filtering operations, a process which introduces an additional ad-hoc parameter 
(test-filter to filter-width ratio) etc. It is also important to note that a regularization 
procedure for the evaluation of dynamic parameters is almost always required to 
render them spatially smooth, thus avoiding numerical instabilities. This process is 
not always unique or justifiable, and typically involves averaging in homogeneous 
directions (if any), thresholding, smoothing, or otherwise if no homogeneous direc- 
tions exist. Other more practical issues pertain to the division by near-zero numbers 
as in the equations for Cp, Cz and so on. 


2.2 Scale Similarity 


Consider an incompressible flow in which case the unresolved stress tensor is now 
simply t;; = uju; — u;u;. The closure problem reduces to finding a suitable approx- 
imation for u;u j. Consider u; = u; — l; i.e. the difference between the unfiltered and 
filtered fields. Then we have upon expansion of the filtered product, 
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muj =(üŭ; + u;)(uj +u’) 


= e “a / 7 / fayt 
SUjUj + Uiu; + Uju; +U; 


=Ujuj + u;(uj — üj) + üj(ui — ui) + (u; — u;)(uj — üj) (13) 


Up to this point the expansion is perfectly fine however the problem has not disap- 
peared since we are left with further unclosed terms namely the last three terms in 
the equation above. The main step which follows in scale-similarity models to solve 
this problem is to assume that (Bardina et al. 1983), 


uj(uj — Uj) ~ u; (uj — Wj) = (üj — Uj) (14) 


and that, 


(u; — u;)(uj — ū;) ~ (u; — l;) (uj — j) = (Ñ; — l) (ūj— āj) (15) 


i.e. essentially that that filtering operations commute to the individual components 
of each product. The above assumptions eventually lead to, 


il; (16) 


Sı 


Tij = Uju;j — 


` 


which is the scale-similarity model (SIMB) of Bardina for incompressible flows (Bar- 
dina et al. 1983). The compressible version derived following analogous arguments 
reads, 


tij = Plait; — ii ;), (17) 


Scale-similarity models are able to predict backscatter unlike the static Smagorin- 
sky model however when applied to LES they have long been known to provide insuf- 
ficient dissipation, clearly a result of the assumptions involving the filtering opera- 
tions. In an attempt to improve the predictions of the scale-similarity model Andreson 
and Domaradzki proposed an improved version (Anderson and Domaradzki 2012). 
Based on the Inter-Scale Energy transfer model of Anderson and Domaradzki (2012) 
Klein et al. then (2015) suggested a modified version for application to reacting flows 
(SIMET). This model reads, 


o E AAS 
Tij = p (iia; + ia — Uitj -iå x (18) 


In fact, there exist a plethora of scale-similarity models in the literature and a 
common characteristic of the majority of them is insufficient dissipation. As a result, 
the most usual application of scale-similarity models is in mixed models. In such 
models as the name suggests different models are mixed together with the most 
usual approach being the addition of an eddy-diffusivity type of model (typically 
Smagorinsky) to a scale-similarity model in order to provide sufficient dissipation. 


96 Z. M. Nikolaou et al. 


2.3 Gradient Model 


The gradient model (GRAD) can be derived by expanding in Taylor series the filtered 
velocity product in the expression for t;; (Vreman et al. 1996) and retaining the 
leading term in the expansion (Clark 1979) leading to, 


_ A? dij Oi; 
i OTD Bay Ax,’ 


(19) 


Models of the above kind typically give very good results in a priori studies and 
provided the filter width is sufficiently small so that the contribution from the terms 
dropped in the Taylor series expansion is small. However, like the scale-similarity 
models gradient-type models were also found to provide insufficient dissipation in 
LES, and as a result they are mainly used in mixed models. An interesting point with 
the gradient model is that it is essentially a low-order deconvolution-based model 
(discussed later on). 


2.4 Clark Model 


Vreman et al. (1996) built upon the mixed model of Clark (1979) to produce the 
following dynamic mixed model, 


A? dit; Oi; T 
E E AN |S |S... 20 
Tij PTD x Om cPA|S'|S;; (20) 
where 
_ Om; On; 2. dix diene 
S.@ = J ôi; =2 | Si; — -ôi Sk ), 21 
pag Ba a oe ( i= ge u) “u 


and | S’| = (Si; S/D". In the static version Cc = 0.172 and in the dynamic version 
it is calculated using, 


(M;; (Li; — Hij)) 


C= e a (22) 
(M;;M;;) 
Denoting v; = pit; /p, the tensors Hj; and Mj; are given by 
A? ðv; av; A? (_ dia; Bit, 
Hi = i ly, (23) 
12 OXx OX, 12 OXK OXK 


and 
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The Clark model is a mixed model with the first part consisting of a gradient 
component and the second consisting of a Smagorinsky-type component to provide 
the necessary dissipation. This model gave good results for the temporal mixing layer 
in Vreman et al. (1996, 1997) and was also one of the models selected for testing in 
Nikolaou et al. (2021) in order to elucidate any difference with the gradient model 
and to shed light as to whether the eddy-diffusivity part improves the predictions or 
not. 


2.5 Wall-Adapting Local Eddy-Viscosity (WALE) 


This model was used to simulate a wall-impinging jet with overall good results in 
Lodato et al. (2009). It is a mixed model with a Smagorinsky-type component and a 
scale-similarity component, 


1 = z 1. z ao AA 
Tij — zij Tkk = —2pv, € = 53u) + pluju; — u;u;), (25) 


The turbulent viscosity is calculated from the velocity gradient and shear rate tensors 
using, 


Gizi 
v = (CwA*) == ____., (26) 
(Sij Sij) F GORKA 


a. 


The model constant Cw = 0.5, and 5; ; 1S the traceless symmetric part of the squared 


resolved velocity gradient tensor 3;; = 0u;/0x;, 


1 1 
~d ~2 | 22 ~2 
Sij = z (8y + gj) — zôij kk (27) 
2 3 
where a}, = gixgxj. Note that in this case as well, the static Yoshizawa closure is 
used to model the trace of the stress tensor as discussed above. 


3 Deconvolution-Based Modelling 


Deconvolution methods were probably first introduced in fluid mechanics research 
in the works of Leonard and Clark (Clark 1979; Leonard 1974). Deconvolution aims 
to invert the filtering operation in LES in order to obtain an approximation of the 
unfiltered field #* from the filtered field @ which is resolved by the LES. Then, the 
filtered non-linear functions of ¢ can be approximated using the deconvoluted fields 
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ie. f(A) =~ f (*). In the case of the unresolved stress tensor 7; j is a function of 


the three velocity components therefore the term is closed using tj; ~ puu” — 
üü j). Since the deconvolution operation is a purely mathematical operation relating 
filtered and unfiltered fields such methods do not include any assumptions and/or 
any modelling parameters/constants. As a result, in principle, they can be used to 
model a wide range of unresolved terms in the governing equations for different flow 
configurations including both reacting and non-reacting flows. The deconvolution 
can be accomplished with (a) Approximate methods, (b) Iterative methods and (c) 
using machine-learning. 

Approximate methods are based on truncated Taylor series expansions of the 
inverse filtering operation. This approach was used to derive explicit algebraic mod- 
els for the Reynolds stresses in non-reacting flows (Domaradzki and Saiki 1997; 
Geurts 1997). In the works of Stolz and Adams (1999) an Approximate Deconvolu- 
tion Method (ADM) based on a truncated expansion of the inverse filter operation 
was used, and the deconvoluted signal was then explicitly filtered to obtain closures 
for the Reynolds stresses. The method was later used by the same authors to model 
the Reynolds stress terms in wall-bounded flows as well (Stolz and Adams 2001) 
where classic models such as the static Smagorinsky model are otherwise too dissi- 
pative. Approximate deconvolution methods have also been applied to reacting flows 
(Domingo and Vervisch 2015, 2017; Mathew 2002; Mehl and Fiorina 2017) with 
overall good results. 

Iterative deconvolution methods include the use of reconstruction algorithms such 
as van Cittert iterations (Nikolaou et al. 2019; Nikolaou and Vervisch 2018; Nikolaou 
et al. 2018) or otherwise (Wang and Ihme 2017). The classic van Cittert algorithm 
with a constant coefficient b reads, 


gun 2 oe +b(¢ SG x pt") (28) 


where ¢*° = ġ, and ¢*" is the approximation of the un-filtered field for a given 
iteration count. In the case @ = pu; and ¢ = p with b = 1 (typical value), the first 
two iterations result in the following approximations for the unfiltered density and 
density-velocity product, 


p =p 
p! =2p-P 
{pu;}"° = pli 


The” approximation of pu;u j is calculated using {ou;u j}*" = {pu;}"{ouj}""/p™, 
and the corresponding approximation of the unresolved stress tensor is calculated 
using Ti = p({puiu;¥"/P — uju;). It is straightforward to show that the first two 
are, 
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F 
1 Apu; - pu; — 2pu; - pu; — 2puj; - pu; + pui: PU; a 
T; = = = puju; 
2p — p 


Note that for n = 0, a Bardina-like scale-similarity model is recovered. For n = 1 
an extended similarity-like model is obtained which involves double and triple- 
filtered quantities and so on for higher-order approximations. Successive iterations 
lead to higher-order approximations of the unfiltered fields and of the unresolved 
stress tensor as shown by Stolz and Adams (2001). For example, four iterations are 
sufficient to recover the gradient model supplemented by the next term in the series 
(Eq. B9 in Stolz and Adams 1999). 

It is important to note that deconvolution methods only recover wavenumbers 
which are resolved by the LES mesh. As a result, deconvolution methods require 
h/A < 1 so that wavenumbers below A can be recovered. As for the van Cittert 
algorithm it is a linear one, and for periodic signals it is straightforward to show that 
for a sufficiently large number of iterations, and provided 0 < b < 2, the algorithm 
is stable and converges to the original value of the un-filtered field for all finite wave- 
numbers on the mesh (Nikolaou and Vervisch 2018). b is typically taken to equal 
1 for non-oscillatory convergence as shown in Nikolaou and Vervisch (2018). The 
maximum number of iterations required for a sufficiently small reconstruction error, 
depends on the largest wavenumber resolved by the mesh i.e. on the h/A ratio with 
increasing resolution requiring a larger number of iterations. 


4 Machine-Learning Based Models 


The theoretical justification for using machine-learning methods and specifically 
artificial neural networks can be justified by the seminal work of Hornik (1991) 
where it was proven that a feed-forward neural network, even with a single hidden 
layer, acts as a universal function approximator (for functions with certain properties), 
in the limit of a sufficiently large number of nodes. As a result, algebraic closures 
of increased order of complexity can in principle be developed e.g. for the stress 
tensor by adjusting the number of layers and/or nodes. Machine-learning methods 
with regards to modelling the stress-tensor in the context of LES can (thus far) be 
roughly divided into three distinct categories: 


(a) Optimization/tuning of existing model parameters and/or their evaluation pro- 
cedures. 

(b) Direct modelling of the stress tensor using as inputs variables which are resolved 
by the LES. 

(c) Deconvolution-based approaches. 
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In comparison to non-reacting flows the use of machine-learning for modelling 
purposes in reacting flows is scarce and has been primarily used to model/accelerate 
the chemical kinetics (Chatzopoulos and Rigopoulos 2013; Ihme et al. 2009; Sen and 
Menon 2009; Sen et al. 2010). In terms of modelling, convolutional networks were 
successfully employed to model the Flame Surface Density (FSD) in Lapeyre et al. 
(2019) which is an important term in reacting LES (Nikolaou and Swaminathan 
2018), and was shown to outperform classic state of the art algebraic models. In 
Nikolaou et al. (2018, 2019) convolutional networks were used in a deconvolution- 
based context to model the scalar variance, a key modelling parameter in flamelet 
methods while Seltz et al. (2019) employed convolutional neural networks to provide 
a unified modelling framework for both the source and scalar flux terms in the filtered 
scalar transport equation. With regards to modelling the stress tensor, categories (a)— 
(c) are discussed in the text which follows. 


4.1 Type (a) 


Probably the first application of machine-learning in LES with regards to the stress 
tensor dates to the work of Sarghini et al. (2003) in which a neural network was 
trained to predict the turbulent viscosity parameter in the Smagorinsky part of a 
mixed model (Smagorinsky+Bardina). The network was trained by first running LES 
at Ret = 180 with Bardina’s model and the viscosity parameter calculated using the 
classic dynamic procedure. The data generated from the LES were then used to train 
the network to essentially replace the more expensive dynamic calculation of the 
viscosity parameter. The inputs consisted of the nine velocity gradients du; /dx; and 
the six velocity fluctuation products uu. The network was four layers deep, 1(15)- 
2(12)-3(6)-4(1) with the numbers in parentheses indicating the number of neurons in 
each layer, and fully connected. The authors reported a 20% speedup in comparison 
to using the dynamic procedure and that the network performed well for a certain 
range of Ret close to the training Reynolds number. For larger Reynolds numbers at 
Ret = 1050 a novel training procedure was concluded to be required. 

In a more recent study (Xie et al. 2019) a version of the Clark model presented in 
Sect. 2 was adopted having two tunable parameters instead of one: one for the gradi- 
ent part and the other for the Smagorinsky part. DNS data of compressible decaying 
turbulence were then used to train a neural network to predict these two parameters 
using as inputs the filtered velocity divergence du; /0x;, the filtered vorticity magni- 
tude |é;;,0u;/0x;|, the filtered velocity gradient magnitude ,/0u;/dx ju; /dx; and 
the filtered strain rate tensor magnitude ,/5;;5;;. The developed networks showed 
improved performance over the static/dynamic Smagorinsky and classic Clark mod- 
els in the a posteriori testing which followed. 
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4.2 Type (b) 


The first direct modelling approach dates to the work of Gamahara and Hattori (2017) 
where DNS data of turbulent channel flow at Ret = 180 were used for training the 
networks in the usual approach whereby the DNS data are filtered to simulate an LES. 
A range of possible inputs were tested: (a) {y, Sij}, ©) {y, Sij, Qij}, (C) {y, Ou; /3x;} 
and (d) {du;/dx;}, where Q;; = (0u;/dx; — du; /0x;)/2 is the rotation-rate tensor, 
and y is the distance from the wall. In total six three-layer fully connected networks 
were trained i.e. one for each component of the stress tensor. Correlation coefficients 
were then extracted between the predicted and as-extracted from the DNS compo- 
nents of the stress tensor. For the largest and most dominant streamwise component 
T 1, all four sets showed similar correlations in the region of 0.8 with group (c) hav- 
ing the highest. This group was then tested (a-priori) against DNS data of higher 
Reynolds number at Ret = 400 and Ret = 800 with overall good results. A poste- 
riori tests at Ret = 180 and Ret = 400 were also conducted in the same study with 
overall good results in comparison to the classic Smagorinsky model even though 
no obvious advantage was reported by the authors. 

In the same spirit of Gamahara and Hattori (2017), Wang et al. (2018) used DNS 
data to train a network to directly predict the stress tensor. The DNS data corre- 
sponded to homogeneous decaying turbulence at Re, = 220. Five different sets of 
inputs were tested using four-layer and five-layer networks: (a) u;: 1(3)-2(20)-3(10)- 
4(1), (b) di; /Ax;: 1(9)-2(40)-3(20)-4(1), (c) 3 ü; /3?x;: 1(9)-2(40)-3(20)-4(1), (d) 
071; /Ox; dx, 1(9)-2(40)-3(20)-4(1) and (e) all of the previous inputs: 1(30)-2(90)- 
3(60)-4(30)-5(1). As in Gamahara and Hattori one network for each component of 
the stress tensor was developed. Of all the inputs tested groups (b) and (e) produced 
the highest correlations in a priori testing, with group (e) however only improv- 
ing marginally the correlations at the expense of having a more complex network. 
Therefore the importance of using the velocity gradients much like in the study of 
Gamahara and Hattori was confirmed albeit in a different configuration. Of course 
this is not surprising since the velocity gradients appear in many models for the stress 
tensor. Moving on, a further refined network based on group (b) was then developed 
and tested a posteriori in LES and compared against the static and dynamic Smagorin- 
sky models. The ANN model showed improved agreement in comparison to the two 
classic models both in predicting the temporal evolution of the kinetic energy and 
its dissipation rate. In terms of computational cost, the ANN model was found to be 
3.6 times slower than the static Smagorinsky model and 1.8 times slower than the 
dynamic Smagorinsky model, indicating that neural network models need to be as 
simple as possible to limit computational cost. 

Following Wang et al. (2018) in Zhou et al. (2019) a similar procedure was applied 
to the same configuration i.e. decaying homogeneous turbulence in order to develop 
a network for the stress tensor. In contrast to the the previous works (Gamahara and 
Hattori 2017; Wang et al. 2018) a single network was trained for all six components 
of the stress tensor while additionally taking into account the filter width which 
along with the nine velocity gradients constituted the input set to the network. The 
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evaluation was performed both a priori against the DNS data and a posteriori with LES 
with the ANN-based model showing an overall improved performance in comparison 
to the dynamic Smagorinsky model. 

In amore recent study (Park and Choi 2021) the case of turbulent channel flow was 
revisited. As in the work of Gamahara and Hattori (2017) similar inputs were tested 
with a four-layer network and six outputs instead. The inputs tested included single- 
point but also multiple-point variables along the streamwise and spanwise directions. 
The inputs consisted of (a) S;;-single point (b) du; /dx ;-single point, (c) S;;-multiple 
points, (d) du;/dx;-multiple points and (e) {u;, du; /dx;}-multiple points. In the a 
priori tests it was found that the groups (c) and (d) provided the highest correlations 
and reasonably predicted the backscatter. However, in a posteriori tests it was found 
that these inputs led to instabilities unless backscatter clipping was used. The single- 
point group (a) on the other hand showed very good agreement in the a posteriori 
tests despite the lower correlations observed in the a priori tests. 

In reacting flows, an posteriori study using a closely-related data-based approach 
has been examined in Schoepplein et al. (2018) where Gene-Expression Program- 
ming (GEP) was employed. In this approach t;; was assumed to depend on the strain 
rate and the rotation rate tensors $;; and Q;; respectively (as in Gamahara and Hattori 
2017), but also on the filter width A and filtered density 6. GEP was then used to 
derive a best-fit function for the stress-tensor which showed good agreement against 
the DNS data. 

The direct modelling approach for reacting flows was first examined in Nikolaou 
et al. (2021). A DNS database of a turbulent premixed hydrogen V-flame was used 
in order to train a network to predict all six components of the stress tensor using 
as inputs the filtered density p, and the nine velocity gradients du; /dx; (suitably 
normalised). In comparison to previous studies in the literature this DNS configura- 
tion was particularly challenging to model due to the strong inhomogeneity in the 
direction perpendicular to the mean stream-wise flow, the presence of a bluff body, 
and the presence of heat release modelled using detailed chemistry—the configura- 
tion is shown in Fig. 1. The lowest turbulence cases V60 and V60H (Rer = 220) 
were used for training the networks while the highest turbulence level case V90 
(Rer = 562.8) for testing the networks. A 1(10)-2(40)-3(10)-4(18)-5(6) network 
structure was developed for each filter width considered, able to predict all six com- 
ponents of the stress tensor (Nikolaou et al. 2021). In contrast to previous studies 
employing fully connected layers in order to account for the strong inhomogeneity 
in the cross-stream directions it was found necessary to decouple layers 4 and 5 by 
introducing 3 to 1 connections rather than fully connected between these two layers. 

A thorough a priori comparison against all models presented in Sect. 2 was con- 
ducted for all three filter widths considered i.e. at A /ôz = 1, 2 and 3 where 6, is the 
laminar thermal flame thickness. Figures 2 and 3 show the instantaneous predictions 
(normalised) of all models considered for the largest filter width for the dominant 
components T1; and 113 respectively. These results are quantified in terms of the 
Pearson correlation coefficient for each individual component of the stress tensor 
averaged over all filter widths in Fig. 4. The results show that the networks are able 
to outperform the predictions obtained using the classic models while the work in 
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Fig.2 Scatter plots of instantaneous values of DNS and modelled t11 on the LES mesh, for At =3 
(Nikolaou et al. 2021) 


Nikolaou et al. (2021) also confirmed the results found in Klein et al. (2015) on the 
poor performance of the Smagorinsky model (static and dynamic) for reacting flows. 

Another important point to consider in the model evaluation step is the ability 
of a model to predict the correct relative magnitude between the different stress 
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components which amounts to evaluating the alignment angle between the DNS and 
modelled resultant stress in a given direction. A perfect model would correspond to a 
zero alignment angle between the modelled and DNS stresses in a particular direction 
and the probability density function would approach a 6 function at zero. This evalu- 
ation step is particularly important to do in flows with strong inhomogeneities since 
in such cases one must ensure that the model’s predictions are not biased towards 
any of the dominant or non-dominant components of the stress tensor. Therefore, in 
a further evaluation step in Nikolaou et al. (2021) probability density functions of 
the alignment angle between the modelled and DNS stress tensor t;; were extracted 
and compared for each model. The results are shown in Fig. 5 where it is apparent 


that the ANN-based model shows an improved performance in comparison to the 
classical models. 
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4.3 Type (c) 


The first use of machine-learning in a deconvolution-based context dates to the work 
of Maulik and San (2017) where a single-layer network with 100 neurons was trained 
to recover estimates of the unfiltered velocity components u* from their filtered coun- 
terparts u;. The inputs to the network consisted of the filtered velocity components 
in the neighbourhood of a given point. This enabled the direct modelling of the stress 
tensor using explicit filtering on the deconvoluted variables. The developed networks 
were tested a priori for different cases including 2D Kraichman, 3D Kolmogorov and 
compressible stratified turbulence with overall good results. 

In the same spirit, a neural network was trained in Yuan et al. (2020) to reconstruct 
the unfiltered velocity components which was tested both against the DNS data and 
a posteriori in LES of forced isotropic turbulence. The inputs consisted of the filtered 
velocities in the region surrounding a given point as in Maulik and San (2017) and the 
outputs consisted of the three unfiltered velocity components which were then filtered 
explicitly to model the stress tensor as in classical deconvolution-based approaches. 
In a posteriori testing, the ANN-based models provided improved predictions over 
the dynamic Smagorinsky model. 


5 A Note: Sub-grid Versus Sub-filter 


It is important to note that the terms “sub-grid” and “sub-filter” are different. “Sub- 
grid” refers to scales not resolved by the mesh h while “sub-filter” refers to scales not 
resolved by the filter width A. In the majority of classic approaches h/A = 1 and the 
terms are equivalent however in approaches which include deconvolution/machine- 
learningh/A < 1in which case the terms are not equivalent: in such cases “sub-filter”’ 
refers to scales between / and A which are resolved by the mesh and can potentially 
be recovered e.g. using deconvolution and/or suitably trained neural/convolutional 
networks. 
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6 Challenges of Data-Based Models 


6.1 Universality 


As the name suggests, data-based methods depend on data. One can view machine- 
learning methods such as ANNs and CNNs as a multi-dimensional data-fitting pro- 
cedure. As a result, the predictive ability of a network depends on the dataset. For 
datasets not too dissimilar to the dataset used to train a network in the first place, 
the predictions are expected to be reasonably good since in such cases inference is 
equivalent to a form of high-dimensional interpolation. For datasets which are too 
dissimilar (which lie far from the multi-dimensional fitted surface) the predictions 
are expected to be poorer in comparison since in such cases inference is equivalent to 
extrapolation. For instance, a neural network trained solely on homogeneous decay- 
ing turbulence data to predict the stress tensor would probably perform poorly in 
shear-dominated flows and vice versa. Increasing the training data-size is always an 
option however this would lead to even more complex networks with increased com- 
putational cost. Another option would be to train case-specific networks and switch 
between them depending on the local flow configuration. In general, the universality 
of a network depends on the size, quality, and diversity of the databases used for 
training. 


6.2 Choice and Pre-processing of Data 


Any inputs to a data-driven model need to be appropriately scaled, and standardiza- 
tion is acommonly used procedure for this purpose. Usually in the turbulence mod- 
elling community, such standardization is performed on the input variables which 
are already appropriately normalized by using some physical quantities such as mean 
flow velocity and turbulence length scale. However, it is often the case that such refer- 
ence quantities are not available or they do not necessarily represent flow phenomena 
in practical problems. For example, non-reacting flow DNS is often performed for 
non-dimensional quantities. One way to train a model is to use such non-dimensional 
quantities as they are with or without standardization. While such a strategy would 
not require normalization based on physical quantities for training, applying a model 
based on this strategy to practical LES problems, one would face an issue of finding 
appropriate parameters to non-dimensionalize the quantities. 


6.3 Training, Validation, Testing 


Developing a model based on machine-learning typically involves three steps namely 
training, validation, and testing. The validation step is typically performed during 
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the training phase on a subset of the training data while the chosen testing dataset 
varies from study to study. In some studies for instance the testing dataset is also a 
subset of the training dataset albeit at different spatio-temporal coordinates within 
the computational domain. This approach is convenient as there is no need to perform 
additional and often expensive simulations to generate new data e.g. at a higher Re 
or Ma number. However this approach may introduce a bias in the predictive ability 
of the network since the testing dataset may be too similar to the training/validation 
datasets. Therefore careful thought is required on the most appropriate training and 
testing strategy. 


6.4 Network Structure 


The choice of network structure is typically performed on a trial and error basis and 
to date there is no formal/theoretical procedure to a priori obtain the best network 
structure (number of layers, number of nodes, type of activation function, type of loss 
function) for a given set of inputs and outputs which minimises the training error. In 
addition, increasing the number of layers and/or nodes does not always improve the 
predictive ability of the network. Furthermore, and perhaps there is no formal way 
of a priori choosing the best set of input variables for a given output set and for a 
given network structure-typically a range of inputs are tested based on intuition. 

When it comes to practical LES, some networks are more difficult to implement 
and parallelise in LES solvers than others. For instance, point-wise inputs are very 
convenient for LES applications while inputs requiring the values of the surround- 
ing mesh points are tricky to implement and parallelise in practice using MPI. This 
is often the case with CNNs and other types of networks utilizing plane and vol- 
umetric inputs on Cartesian mesh points. However most LES codes often employ 
non-uniform and unstructured meshes. Of course, the fields can be interpolated to 
generate CNN-like inputs at every iteration at every point, but this would result in 
increased computational cost and other associated issues (Kashefi et al. 2021). One 
potential strategy to circumvent this issue while keeping the important spatial infor- 
mation for the inputs is so-called “point-cloud deep learning” (Kashefi et al. 2021). 
Although this framework is not yet well established for modelling the stress tensor, 
the compatibility to arbitrary mesh geometry is something future machine-learning 
models should consider. 


6.5 LES Mesh Size 


The development of LES models using DNS data involves explicit filtering operations 
with a filter size A. An important question is then how does one choose h i.e. the 
LES mesh size? Typically in classic approaches h/A = | but this choice does not 
ensure that the resolved fields such as velocity and scalar fields are well-resolved. 
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Consequently, the gradients of these variables as obtained on the LES mesh which 
are typically used as inputs to neural networks are also not well-resolved which 
introduces a bias in the predictive ability of the network-this is also the case when 
evaluating the performance of classic models which involve gradient terms. 

In an effort to resolve this Nikolaou and Vervisch (2018) proposed a criterion 
for the LES mesh size, based on a scalar variation evolving from 0 to 1, which 
was originally proposed for a “reaction progress variable” (e.g. non-dimensional 
temperature) but which can also be regarded as a normalized fluctuating velocity 
component $(x) (0 < @ < 1). 


ree ; (1 deg (=) (29) 


where ô is a length scale for the gradient defined as ô = 1/ max(d¢/dx). Filtering 
Eq. (29) based on the filtering operation (Eq. (2)) with a Gaussian kernel, the filtered 
field (x) can be obtained as, 


1 XT 


(x) = ; l+erf (30) 


The length scale for the gradient of the filtered field can be obtained in the same 
manner as ô = 1/max(d¢@/dx), which leads to 


z A2 1/2 
j= 0(1+25) (1) 


ensuring 5/5 > 1 i.e. that the length scale increases due to the filtering operation. It 
would be more useful to rewrite Eq. (31) in terms of 5/A, since our interest here is 
how fine the mesh should be to capture the gradient information of the filtered field 


with A, 
ô m NI 
—=|-+— : 32 
x= (3+3) (32) 
Usually, to resolve a filtered gradient n mesh points are required within 6 which 
results to, 
hfe ey 
= . 33 
A n ( k =) Pa 


In most turbulent flows, it is expected that 6/A ~ 0. Equation (33) yields h/A ~ 
0.36 for n = 2 (two mesh points within the filtered slope), and h/A ~ 0.18 for 
n = 4, leading to the insight that the LES mesh required to capture the filtered 
gradient should have two to five mesh points within A. This consideration is required 
when generating filtered quantities from resolved fields such as DNS, especially for 
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(f) 


Fig.6 Scatter plots of target values y; and predicted values 9; . a-f: scenarios (a) to (f), respectively 


machine-learning with gradient-related inputs, but is also useful for conventional 
gradient model assessments. 


6.6 Performance Metrics 


The quantification of prediction accuracy is very important since in the modelling of 
the stress tensor a model assessment needs to be performed spatio-temporally and for 
all six components of the stress tensor—a comprehensive visual examination is just 
not enough. Amongst the possible quantification methods, the mean squared error 
(MSE) would be the most convenient to use since it is already incorporated in the 
loss function of most machine-learning algorithms. Another choice is the root mean 
squared error (RMSE). However, MSE and RMSE are considered to be sensitive to 
local outliers which are prevalent in non-linear phenomena. For this reason, the mean 
absolute error (MAE) may be more suitable for model assessment purposes. 

In various model developments in the turbulent flow community, the cross- 
correlation coefficient is also used extensively. While this quantity is familiar to 
the community, relying on this coefficient alone can bias the model performance 
assessment significantly. This point is illustrated by using the following simulated 
target values y; and predicted values 4; in scenarios (a) to (f), where i is the index of 
N-samples. 


110 Z. M. Nikolaou et al. 


(a) Predicted values are scattered around the target values. 

(b) Predicted values are scattered around the target values, but 15% of samples have 
much larger deviation (outliers). 

(c) Predicted values are scattered around the target values, but 30% of samples have 
much larger deviation (outliers). 

(d) Predicted values are scattered around a line }; = 0.5y; + 0.25. The deviation 
from the line is the same as (a). 

(e) Predicted values are scattered around a line ĵ; = y; + 0.15. The deviation from 
the line is the same as (a). 

(f) Predicted values are scattered around a line ĵ; = y; + 0.30. The deviation from 
the line is the same as (a). 


Scenario (a) represents perhaps a good model. In turbulent flow problems where 
the variables take a wide range of values however, such a good model may output a 
prediction with a large deviation for a limited number of samples, and such situations 
may correspond to scenarios (b) and (c). The situations where the trend of predicted 
values is close to the target values but there is some deviation between the two may 
correspond to scenarios (d), (e) and (f). Examples of such scenarios are shown in 
Fig. 6. 

For the scenarios (a)-(f), the following metrics often used for model assessments 
are considered, 


e Mean absolute error y 
Ži [yi 7 ŝi 


EMAE = a= (34) 
e Relative mean absolute error Anas 
€rMAE = ——_- (35) 
y 
e Mean squared error 
N a\2 
ag AUEN ul i G6) 
e Root mean squared error 
€RMSE = /€MSE- (37) 
e Relative root mean squared error 
€RMSE 
€tRMSE = 5 (38) 


Pearson’s cross-correlation coefficient 
N = A 
Lay) (5; = 5) 
Pp = 
N E [x x\? 
ist (yi — y) (5; = 5) 


(39) 
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Table 1 Scatter plots of generic target values y; and predicted values 3; 


Scenario €MAE | €MAE |€MSE_ | €RMSE_ | RMSE | Pp p> R? Ei 

(a) 0.05 0.09 0.00 0.06 0.12 0.98 0.96 0.96 0.81 
(b) 0.11 0.23 0.06 0.24 0.49 0.76 0.58 0.29 0.55 
(c) 0.17 0.36 0.11 0.33 0.68 0.64 0.41 —0.39 0.29 
(d) 0.13 0.26 0.02 0.14 0.30 0.98 0.96 0.74 0.48 
(e) 0.15 0.31 0.03 0.16 0.33 0.98 0.96 0.67 0.38 
(f) 0.30 0.62 0.09 0.31 0.63 0.98 0.96 —0.18 | —0.23 


e Coefficient of determination 


LG (yi = si) 


Pels í (40) 
Eao- 9° 
e Coefficient of Legates and McCabe (2013) 
N A 
Er= i= ie [yi ŝi (41) 


Ey-y 


In the list above - denotes the mean value. The metrics pp, R? and Ej, yield 
1 for a perfect model. All of the above metrics are computed and summarised in 
Table 1 for scenarios (a)—(f). Note that o is also shown since it is often used as an 
alternative definition for the coefficient of determination. As clearly seen, the cross- 
correlation coefficient p, shows relatively high values for all the scenarios except for 
(c) where pp = 0.64, which may still be acceptable for certain purposes. However, 
there is substantial discrepancy between the intuitive interpretation of Fig. 6 and p, in 
Table | for scenarios (d)—(f). For these cases the relative errors €,,yaz and €,;pusg, Vary 
from 25% to 63%, while p, = 0.98 for these scenarios. Also, €mse and R? tends 
to be more sensitive to large deviation of small number of samples respectively than 
€ymag and E; (see the scenario (b)), and this is considered due to (y; — ji). These 
considerations suggest that model assessments based on p, alone cannot thoroughly 
assess a model’s performance accurately, and p, should be used along with visual 
examination and/or another metric. 


7 Summary 


Machine-learning methods are increasingly being used by the fluid mechanics com- 
munity for modelling purposes and in particular for the unresolved stress tensor. The 
applications are diverse while a large number of both a priori but also a posteriori 
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assessments have shown data-based methods either to outperform the predictions of 
classic models or to at least match them. The developed networks are typically one 
to five layers deep with around one hundred neurons in each hidden layer with the 
structure of the networks varying from study to study. Overall, the best-performing 
inputs appear to be gradients of the filtered velocity components and functions of the 
velocity gradients such as the strain rate tensor and the rotation-rate tensor irrespec- 
tive of the nature of the flow i.e. reacting or non-reacting. In terms of computational 
cost this depends on the structure of the networks with most of the developed net- 
works in the literature, despite being slower than the classical algebraic models, 
exhibiting around the same order of magnitude cost. Despite however the success 
of the developed networks some important issues still remain which are discussed 
in the text. The most important in the authors view is universality. The predictive 
ability and versatility of a network is tightly coupled to the dataset used for train- 
ing in the first place. At the time being, in the majority of studies in the literature 
these databases are restricted to small-scale DNS of often canonical flow problems 
such as decaying homogeneous turbulence, turbulent channel flow, statistically pla- 
nar freely-propagating flames etc. while in practical LES the flows are significantly 
more complex but also at higher Re and Ma numbers. In order to overcome this 
issue and to eventually obtain a truly case-independent and parameter-free machine- 
learning-based model for the stress tensor, further research is required at conditions 
which are more relevant for practical flows including both a priori and a posteriori 
studies. 
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Abstract Machine learning provides a set of new tools for the analysis, reduction 
and acceleration of combustion chemistry. The implementation of such tools is not 
new. However, with the emerging techniques of deep learning, renewed interest in 
implementing machine learning is fast growing. In this chapter, we illustrate appli- 
cations of machine learning in understanding chemistry, learning reaction rates and 
reaction mechanisms and in accelerating chemistry integration. 


1 Introduction and Motivation 


Machine-learning (ML), a term associated with a range of data analysis and discovery 
methods, can provide enabling tools for effective data-based science in the analysis, 
reduction and acceleration of combustion chemistry. The tools associated with ML 
can carry out a variety of automated tasks that either serve as effective substitutes 
for modern data analysis and discovery techniques applied to combustion chemistry 
or additional tools for its effective integration in CFD codes. 

The implementation of ML in combustion chemistry is not new. Several tools 
have been used for chemistry reduction or chemistry acceleration. Perhaps one of 
the earliest analysis tools used for combustion chemistry is principal component 
analysis (PCA) (Vajda et al. 2006). By identifying redundant species in a mechanism 
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and eventually eliminating their reactions, PCA plays a similar role to more recent 
methods based on directed relations graphs (DRG) (Lu and Law 2005). 

Artificial neural networks (ANN) also have been used in combustion chemistry. 
Since the early work of Christos et al. (1995), ANNs have been used as substitutes 
for the direct evaluation of the chemical source terms in combustion. Beside their 
use as generalized function evaluators, ANNs have been used in other contexts, 
as discussed below. More recent applications of ANNs in combustion chemistry 
addressed the integration of chemically-stiff systems of equations. 

The premise of ML tools in combustion chemistry lies in the availability of an 
ever-expanding body of data from experiments and computations and the complexity 
of handling this chemistry in the presence of 10—1000s of chemical species and 100- 
10,000s chemical reactions. Some of the challenges associated with combustion 
chemistry and potential applications of ML are highlighted below. 

First, chemistry integration represents the ultimate bottleneck in reacting flow 
simulations. This is partly attributed to the size of chemical systems, involving many 
species and reactions, and the stiffness of their chemistry. This stiffness is associated 
with the presence of disparate timescales for the different reactions in a chemical 
mechanism. Approaches to overcome the presence of such bottlenecks can rely on 
chemistry reduction, chemistry tabulation and strategies to remove the fast time 
scales in chemistry integration. This reduction can be implemented offline from 
detailed chemistry or in situ using adaptive chemistry techniques. Careful chemistry 
reduction can also achieve a significant reduction of the stiffness of the chemical 
systems through the elimination of fast reactions and associated species. 

Second, another difficult challenge with combustion chemistry is the development 
of new chemical mechanisms for an expanding range of fuels. Detailed mechanism 
development is a complex and time-consuming process that usually represents a first 
step prior to chemistry reduction. Identifying the elementary reactions relevant to a 
particular fuel oxidation, then determining their rates and relative importance in the 
mechanisms are integral steps in this process. Such an effort cannot be sustained given 
the need to develop the important elementary reaction data, especially data critical for 
the low-temperature oxidation for these fuels. More importantly, practical fuels tend 
to be complex blends and mixtures of different molecules. Establishing the chemical 
description of 10 or 100s of molecules is very challenging and must include models 
for their transport and thermodynamic properties. Until recently, strategies to develop 
a reduced description of chemistry without access to detailed or skeletal descriptions 
of chemistry have been limited to ad hoc global chemistry approaches that optimize 
rate constant and stoichiometric coefficients for the global reactions by matching 
global observables, such as flame speeds, ignition delay times or extinction strain 
rates. 

However, a growing body of data and detailed mechanisms is now available that 
can be exploited to develop “rules” for representing the chemistry of complex fuels 
(Buras et al. 2020; Ilies et al. 2021; Zhang and Sarathy 2021b,c; Zhang et al. 2021). 
Temporal measurements from shock tubes and rapid compression machines (RCMs), 
although may be limited to a subset of the chemical species present, which may be 
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subject to experimental uncertainty, can provide important relief to detail mechanism 
development as discussed below. 

The challenges listed above can lend themselves to applications of data science 
and the implementation of ML tools for combustion chemistry discovery, reduction 
and acceleration. The various ML methods in combustion chemistry and other appli- 
cations can generally be classified as either supervised (e.g. classification, regression 
models) or unsupervised (e.g. clustering and PCA). Supervised models are a class of 
models in which both input and output are known and prescribed from the training 
data. This data is called labeled. For example, in a regression ANN for chemical 
source terms, we attempt to map the thermo-chemical state (i.e. pressure, tempera- 
ture and composition) to a chemical source terms. During unsupervised learning, the 
output is not labeled. This approach may include for example identifying principal 
components (using PCA) from a thermo-chemical state or clustering of states based 
on the proximity of the thermo-chemical state vector. 

Another class of models that have not been extensively used in combustion chem- 
istry are the so-called semi-supervised models. In semi-supervised models, both 
labeled and unlabeled data are used for the training of these models. These models 
include for example generative models where available data is trained to generate 
new similar data. A popular such model is the generative adversarial network (GAN). 
As expected, ML approaches require data. The quality and quantity of the data is 
critical as discussed below. These approaches are trained on this data, while a portion 
can be used for either validation or testing. 

In this chapter, we illustrate different implementations of ML tools in combustion 
chemistry. The goal is not to provide a comprehensive review of these tools or to 
address all studies involving ML for combustion chemistry. Instead, we attempt to 
provide an overview of various applications of ML in combustion chemistry. It is 
important to note that research in ML for combustion chemistry is a very active 
area of research and more progress is expected in the coming years. The chapter 
is divided into 3 general topics related to: (1) learning reaction rates, (2) learning 
reaction mechanisms and (3) chemistry integration and acceleration. 


2 Learning Reaction Rates 


The law of mass action and the Arrhenius model for the rate constant form the 
traditional representation of the rate of reaction of chemical species in combustion. 
This rate can be expressed in terms of a linear combination of rate of progress for 
each elementary reaction a species is involved in. The integration of chemistry is 
limited by the cost of this evaluation as well as the inherent stiffness of reaction 
mechanisms, exhibited by a wide range of timescales involved and the time-step size 
required to integrate chemistry in combustion simulations. 

Artificial neural networks (ANNs) have been proposed as an alternative tool to the 
direct evaluation of reaction rates based on the law-of-mass-action and the Arrhenius 
law. Perhaps one of the earliest implementations of ANNs in combustion is through 
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Fig. 1 Illustration of the 
ANN-based matrix 
formulation for reaction rates 


their use in their implementation as regression tools for species and temperature 
chemical source terms (Blasco et al. 1998, 1999, 2000; Chatzopoulos and Rigopou- 
los 2013; Chen et al. 2000; Christo et al. 1996; Christos et al. 1995; Flemming 
et al. 2005; Franke et al. 2017; Ihme 2010; Ihme et al. 2008, 2009; Sen and Menon 
2010a, b; Sinaei and Tabejamaat 2017). The primary goal of representing chemical 
source terms with ANN is to accelerate the evaluation of chemistry. The different 
demonstrations of ANN for chemistry tabulation have shown that the ANN-based 
chemistry tabulation method is computationally efficient and accurate. 

ANNs are perhaps the most versatile ML tools that have been used for combustion 
chemistry and other applications. Among these ANNs, one of the most popular ANN 
architectures are the so-called multi-layer perceptions (MLP). A representative MLP- 
ANN architecture is shown in Fig. 1. It is designed to construct a functional relation 
between a prescribed input vector x (x1, x2) and an output vector y (y1, y2). Within 
the context of a regression model, the ANN forms a function for y in terms of x, i.e., 
y = f (x). The input layer in the figure contains the input vector elements, which 
are represented by “neurons”. A similar arrangement is present for the output layer 
where each element is represented by a neuron. The neurons carrying values are in 
the hidden layer, which separate the input and the output layers. In the illustration, 
there is only one hidden layer with 4 neurons shown. The illustrated MLP here is 
fully-connected, meaning that starting with the first hidden layer all the way to the 
output layer, the neurons carrying values are in the hidden layers, which separate the 
input and output layers. The strength of the connections are represented by “weights” 
and the value at the neurons at these layers is expressed in terms of the values of the 
neurons of the previous layers weighted the strength of the connections. Although 
not shown in the figure, additional “bias” neurons can be added to the input and all 
hidden layers. The role of the bias neurons is to provide more flexibility to train the 
model that relates the input to the output vectors. 

To illustrate the relation between the input and the output layers, we use the 
network illustrated in Fig. 1. The output yı, which corresponds to the value of the 
first neuron in the output layer, is expressed in terms of the hidden layer: 
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y= B02 wa 9 +00), (1) 


a) 


where sr pat (1) corresponds the first hidden layer with weights w; and 


values a! ? at the ith neuron in the hidden layer. b® is the bias value at the hidden 

layer and f is the activation function. The bias neuron value serves as an additional 

parameters to fine-tune the network architecture and potentially reduce its complexity 

(i.e., less hidden layers or less neurons per hidden layer). The values of the ith neuron, 
a), , in the hidden layer can be related to the input variables as follows: 


aP = f (wOx + wi) +b). (2) 


Here w? ) and wO correspond to the weights of the connections between the input 


layer and the ith neuron in the first hidden layer associated with inputs xı and x2, 
respectively. The network is trained to determine the weights of all connections from 
input to output layers and the bias values. 

In matrix form, the output values for the hidden layer neurons and the output layer 
neurons can be expressed as follows: 


a) = f (w x+ b) (3) 


and 
y=f (Ww? aD + b”) (4) 


where W and W® are the weight matrices corresponding to the weights of the 
connections between the input and the first hidden layer and the first hidden layer 
and the output layer, respectively. b© and b” are the bias vectors for the input and 
the first hidden layers, respectively, with identical elements in each vector. 

The expression above can be generalized to related on hidden layer or an output 
layer at a level n + 1 to the vector of values from the previous layer level n: 


yan =f (w° y” E b”) (5) 


MLPs vary in complexity as well as in purpose. Accommodating complexity can 
be achieved by increasing the number of hidden layers, the number of neurons per 
hidden layer and the activation functions, which can be varied from one layer to 
another. Prescribing the loss function can also improve the prediction of the target 
output. Although, there are usual choices for the activation functions, there is an 
inherent flexibility in the choice of network parameters, including the activation 
function to represent systems of equations representing physics, as illustrated below. 
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2.1 Chemistry Regression via ANNs 


In this section, we briefly summarize key considerations for establishing efficient 
regression for chemical reaction rates using ANNs. Figure 2 illustrates a relatively 
deep network topology that constructs a regression of the reaction rates for 10 species 
and the heat release rate for the temperature equation from the work of Wan et al. 
(2020). This network has 5 fully connected dense layers between the input and output 
layers. In dense layers, neurons in a given layer are connected through weights to 
all neurons in the previous layer. As indicated, the number of neurons in the hidden 
layers is higher towards the input layer and decays towards the output layer. The rec- 
tified linear unit (ReLU) activation function is used. The network has approximately 
180,000 weights to be optimized during the training stage, which required approxi- 
mately 2.2h on an Nvidia GeForce GTX 1089 Ti GPU. Other variants of the topology 
shown in Fig. 2 have been adopted in the literature (see for example, (Blasco et al. 
1998, 1999, 2000; Chatzopoulos and Rigopoulos 2013; Chen et al. 2000; Christo 
et al. 1996; Christos et al. 1995; Flemming et al. 2005; Franke et al. 2017; Ihme 2010; 
Ihme et al. 2008, 2009; Sen and Menon 2010a, b; Sinaei and Tabejamaat 2017)). 

Determining all these chemical source terms invariably requires more complex 
neural networks than those specialized to predict only one quantity. Within such 
complex networks, the weights from the input layer to the layer prior to the last 
layer are shared among all the input quantities; and the weights relating the last 
hidden layer to the output layer are the primary differentiators for the individual 
reaction rates. There are potentially 3 attractive features for the use of ANNs to model 
chemical source terms. The first feature is the potential acceleration in the evaluation 
of the chemical source through graphical processing units (GPUs) through integration 
of neural networks with existing accelerated packages designed to optimize ANN 
evaluations through mixed hardware frameworks. 

A second attractive feature is that ANNs can be made simpler by adopting only a 
subset of the input. This is motivated by the inherent correlation of thermo-chemical 
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Fig. 2 Illustration of the ANN-based matrix formulation for reaction rates with multiple inputs 
and multiple outputs (from Wan et al. (2020)) 
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scalars ina chemical mechanism, which lends itself to dimensionality reduction meth- 
ods. Alternatively, low-dimensional manifold parameters, such as principal compo- 
nents (PCs) from PCA, or a choice of representative species, including major reac- 
tants, products and intermediates can be used. 

A third feature of using ANNs for learning chemical reaction rates, related to 
the previous one, is that, if a subset of the inputs is used, then, the solution vector 
may also require only a subset of the thermo-chemical scalars to be transported, 
which corresponds primarily to the thermo-chemical scalars in the input vector. 
This can reduce the computational cost. It follows that, if species and associated 
reactions that represent a bottle-neck in chemistry integration are eliminated, then, 
the stiffness of the chemical system is significantly reduced, further accelerating 
chemistry integration. 

Implementing the regression for chemical source terms within a single ANN has 
a number of advantages. First, constraints can be built in the training for the chemical 
source terms, for example to enforce the conservation of elements, mass or energy. 
Moreover, a single network with a number of shared weights may be exploited for 
computational efficiency, since the contributions to the individual source terms occur 
primarily at the connections between the last hidden layer and the output layer. 

However, accommodating all species chemical source terms in a single layer may 
also require a more complex ANN architecture. Alternative strategies to reduce this 
complexity have been used. One approach relies on adapting different ANNs for 
different clusters of data, such as different networks for the reacting and the non- 
reacting zones in the mixture. This approach has been implemented by Blasco et al. 
(2000), Chatzopoulos and Rigopoulos (2013) and Franke et al. (2017) using self- 
organizing maps (SOM) (Kohonen 2013). In these studies, chemistry tabulation was 
implemented in conjunction with closure models for turbulent combustion and SOM 
was used as an adaptive tool to cluster similar conditions of the composition space 
to establish a single ANN regression tables for them. 

SOMs are a popular method and an unsupervised ML technique for clustering and 
model reduction as stated earlier. They are single-layer neural networks that connect 
inputs, which corresponds to data to be clustered, to a (generally 2D) map of nodes 
or clusters. The clustering of the input data is based on their weights relative to the 
different nodes, which are determined iteratively by measuring their “proximity” 
to the node measures. The outcome of this iterative procedure is a mapping of the 
original data into a lower-dimensional space represented by the 2D map of nodes. 
The versatility of SOM in addressing how data is grouped is established through the 
choice of measures of similarity that are used to identify the mapping. For tabulation, 
these measures can be related to the proximity in thermo-chemical space (e.g. sim- 
ilar temperatures and compositions); while, for identifying different phases, these 
measures may rely on the evolution of marker thermo-chemical scalars in time and 
their correlations with other scalars. 

Alternatively, clustering was implemented to group thermo-chemical scalars of 
similar behavior, such as the construction of an ANN for intermediates and another for 
reactants and products (Owoyele et al. 2020). This approach attempts to construct 
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a minimum set of neural networks that are also less complex than the ones that 
accommodate all thermo-chemical scalars. 

Additional consideration for constructing ANNs for reaction rate regression is 
related to the high variability of the input, the thermo-chemical scalars, and the 
output data, their chemical source terms, resulting in strongly nonlinear regressions, 
which may require, unnecessarily, complex and deeper ANNs. A potential way of 
“taming” the data variability is to pre-process the input and the output data. Sharma 
et al. (2020) used log-normalization to pre-process free radicals, which tend to skew 
towards zero. 

Finally, determining an optimum topology for a chemistry regression network is 
not a trivial task. A shallow (one hidden layer) to a moderately deep network may not 
be sufficient to capture the functional complexity of the chemical source terms and 
may result in “under-fitting”. Meanwhile, a much deeper network with numerous 
neurons in their hidden layers may achieve better predictions with an increased cost 
of evaluating the networks and the associated storage needed for the trained weights. 
It can also result in “over-fitting” when data is sparse or does not represent the true 
variability of the accessed composition space. 

Ihme et al. (2008), Ihme (2010); Ihme et al. (2009) proposed an approach to deter- 
mine an optimum artificial neural network (OANN) using the generalized pattern 
search (GPS) method (Torczon 1997). The GPS method is a derivative-free opti- 
mization that generates a sequence of iterates with a prescribed objective functions. 
The optimum network in this method is designed to determine the choice of network 
parameters (number of hidden layers, number of neurons in hidden layers) that min- 
imize the memory requirements, the computational cost and the approximation error 
of the network. 

Nowadays, other automated tools can be used to help optimize a given network. 
These include the so-called automated machine learning (or AutoML) tools, such 
as the Keras Tuner, Auto-PyTorch and the AutoKeras tools (Hutter et al. 2019). 
However, special attention must be paid to the choice of the measure of convergence 
of the training schemes. 


3 Learning Reaction Mechanisms 


Machine learning tools are set to provide greater insight into (1) the discovery of 
chemical pathways and key reactions in a mechanism, and (2) the reduction and 
representation of chemical mechanisms. In this section, we review a number of 
applications in which ML tools have been used for learning reaction mechanisms. 
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3.1 Learning Observables in Complex Reaction Mechanisms 


Although, for many, the ultimate goal of understanding chemical mechanisms is to 
develop ways to reduce them, developing a qualitative and quantitative understanding 
of important pathways for reaction and the various stages of oxidation and identifying 
the main species and reactions important to this oxidation are important crucial steps 
towards mechanism reduction. ML offers powerful tools to achieve these goals. 

Clustering methods have been used in a different context by Blurock and co- 
workers Blurock (2004), Blurock (2006), Tuner et al. (2005), Blurock et al. (2010) 
to identify the different mechanistic phases of fuel oxidation, which can be helpful 
in devising reduced chemistry schemes for these different phases. In Blurock (2004, 
2006) clustering based on reaction sensitivity is used to identify the different phases 
of oxidation of aldehyde combustion and the ignition stages of ethanol, respectively. 
These studies exploit the presence of “similarity” between chemical states to identify 
the phases were the associated species are dominant. Identifying such phases can be 
important in several respects. For example, during the high-temperature oxidation of 
complex hydrocarbon fuels, identifying the two distinct phases of fuel pyrolysis and 
subsequent oxidation have enabled pathways to the development of hybrid chemistry 
approaches (Wang et al. 2018) (see Sect. 3.4). A less obvious distinction between 
the different phases of the low-temperature oxidation of the same complex fuels, can 
also reveal similar strategies to construct hybrid chemistry descriptions by identifying 
representative or marker species for each phase. 

Insight to the physics from simulations or experiments can also provide a pathway 
towards generalizing observations, such as among different fuel functional groups. 
A recent study by Buras et al. (2020) used convolutional neural networks (CNNs) 
to construct correlations between the time scales of the low-temperature fuel spon- 
taneous oxidation and chemical species profiles, primarily for OH, HO2, CH,0 and 
CO), from plug-flow reactors (PFRs) and the first stage autoignition delay time (IDT). 
In their study, the authors relied on PFR simulation of 23 baseline fuels (18 pure 
fuels and 5 fuel blends) spanning a range of functional groups, including alkanes, 
alkenes/aromatics, oxygenates and fuel blends. They used existing mechanisms and 
perturbations of the parameters of these mechanisms to construct a wide database of 
species profiles. Emphasis on OH and HO; is motivated by their role during the onset 
of spontaneous fuel oxidation. These intermediates exhibit different behaviors for 
two general fuels that show different propensities to form OH and HO, during their 
oxidation cycle resulting in different correlations between the time scales for sponta- 
neous fuel oxidation and the first stage IDT, one showing comparable values between 
the two quantities and another exhibiting a much slower first stage IDT. These dif- 
ferent propensities are exhibited in the temporal profiles of these 2 intermediates as 
shown by Buras et al. (2020). 

CNNs are a different class of neural networks compared to the fully-connected 
multi-layer perceptrons shown in Figs.1 and 2. They are specialized for multi- 
dimensional inputs, such as 2D images and include intermediate processing lay- 
ers, convolutional and pooling layers, that are designed to dissect patterns in multi- 
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Fig. 3 A schematic of the CNN architecture used by Buras et al. (2020) to construct correlations 
between profiles of OH, HO2, CH20 and CO2 from PFR simulations of the low-temperature oxida- 
tion of a range of fuels and the first stage ignition delay times (IDTs). Reproduced with permission 
from Buras et al. (2020) 
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dimensional and structured input data. Within the context of the work by Buras et al. 
(2020), the CNN architecture captures the different patterns with the profiles of the 
intermediates, OH and HO>. 

Figure 3 shows a schematic of the CNN architecture used by Buras et al. (2020) 
to construct correlations between profiles of OH and HO, from PFR simulations of 
the low-temperature oxidation for a range of fuels and the first stage ignition delay 
times (IDTs). The input data corresponds to 1D profiles of both OH and HO); while 
the output (or target) is represented by the first stage IDT. By using a CNN, Buras 
et al. (2020) show that they can generate adequate predictions of the first stage IDT 
as shown in Fig. 4. 
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3.2 Chemical Reaction Neural Networks 


One of the more recent developments in ML learning for chemical kinetics is the 
representation of reaction rates with prescribed inputs as the thermo-chemical state 
in terms of neural networks (Barwey and Raman 2021; Ji and Deng 2021). Such a 
representation enables the use of various tools to both accelerate the evaluation of 
reaction rates and develop skeletal descriptions of detailed mechanisms. 
The rate of progress of a global reaction, v4,A + vgB > vcC + vpD, can be 
expressed as: 
PEECOC, (6) 


where the rate constant k is expressed in terms of the Arrhenius law: 


E 
k=AT®? exp(—-—* (7) 
PL RT 


In this expression, A, b and E, correspond to the frequency factor, the pre-exponential 
temperature power and the activation energy. This expression can be re-written as 
follows: 


r = exp (lnk + va lnC4 + vg InCg) (8) 


Ea 
=exp(mA+ b nT- Z6 +n InC, + vg mcs) (9) 


This expression can be formulated as an artificial neural network as illustrated in 
Fig. 5a for a single reaction and Fig. 5b for multi-step reactions. In Fig. 5a, the net- 
work emulates the structure of an ANN with no hidden layers. In this network, the 
input layer corresponds to the natural logs of the concentrations for A, B, C and D. 
The output layer corresponds to their rate of change, — v4 r, —vg r, vc r and vpr, 
respectively. The activation function is the exponential functions and the bias is In k. 
The stoichiometric coefficients, v4, vg, vc and vp correspond to the weights of the 
network. The bias Ink, which represents the temperature-dependent rate constant, 
incorporates to the contributions of the rate parameters, A, b and E4. The illustrated 
CRNN can be generalized to accommodate more reactions and more species, as 
shown in Fig. 5b, thus enabling a neural network description of a set of global reac- 
tions to be optimized via ANNs. However, perhaps the main advantages of CRNN 
beyond the ability to frame reaction mechanisms within a neural network are the 
potential implications for such network for chemistry reduction and acceleration. Ji 
and Deng (2021) demonstrated a framework where the CRNN can be learned in the 
context of neural ODEs as discussed in Sect. 4 below. 

An additional advantage of the CRNN is the potential for chemistry reduction 
via threshold pruning where input and output weights are clipped below a certain 
threshold. This pruning enhances the sparsity of the CRNN, which in turn can help 
speed up the evaluation of reaction rates. Ji and Deng (2021) showed that this prun- 
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Fig. 5 Illustration of the (a) A Neuron for A Single-step Reaction 
CRNN network by Ji and 
Deng (From Ji et al. (2021)). (Law of Mass Action) 
In the figure, the symbols In[A] èv -vð [å] 
“[ |” denote concentrations A 
while the “dots” over the In[ B] @ Up — [ È] 
concentrations in the output © D s 
layer denote reaction rates. 0 
c9 
Reproduced with permission In{C] e 0 [ċ] 
from Ji et al. (2021) 
Inf(D] ®© ina Ink vo® [D] 
-1/RT ®Ea 


nT @b (Arrhenius Law) 


(b) A CRNN Network for Multi-step Reactions 


In[A] @ ® o [å] 
ln[8] © , 
inic] © ® e x. 
In[D] © m è [ċ] 

—1/RT @ ® om 
InT ®© 


ing can still recover accurately the reaction rates in the CRNN by re-balancing the 
remaining weights. 

A similar formulation was proposed by Barwey and Raman (2021). These authors 
also recast Arrhenius kinetics as a neural network using matrix-based formulations. 
By this process, the evaluation of the neural network can exploit specially optimized 
libraries for machine learning that are also optimized for use with graphical process- 
ing units (GPUs). 


3.3 PCA-Based Chemistry Reduction and Other PCA 
Applications 


As indicated earlier, PCA has been one of the earliest ML tools implemented for 
combustion chemistry. From the earlier work of Turany and co-workers (see for 
example (Vajda et al. 2006)) PCA was used to identify the most influential reactions 
in a mechanism through an eigen decomposition related to the sensitivity matrix. 
Their analysis is based on identifying the contributions to a “response function”: 
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which evaluates the cumulative contribution of the normalized deviations of per- 
turbed kinetic model response parameters relative to the original non-perturbed 
kinetic model. Here, f; can correspond to temperature, a measure of species con- 
centrations, or both and other global parameters, such as flame speeds or extinction 
strain rates. œ; is a reaction rate kinetic parameter, which is normally adopted as the 
rate constants for the reaction in a mechanism. Also, / and m in the sum correspond 
to the total number of analysis point (in space or time) and the number of target 
functions (e.g. species concentrations, temperatures). x; corresponds to positions or 
times that involve all the samples in the calculation of Q. 

PCA is implemented on the matrix S'S, where S is the matrix of normalized 
sensitivity coefficients whose component i, j can be expressed as 0 In f;/d Ina;. An 
eigen-decomposition of the matrix yields a set of eigenvalues 4; (ordered from high 
to lower magnitudes) and associated eigenvectors (which form an orthonormal set) 
and principal components (PCs), ¢, which can be expressed in terms of the kinetic 
parameters as: 


¢=Q' y, (11) 


where W is the vector logarithmic parameters Y; = In œj. The eigen-decomposition 
can be used to approximate the response function Q as follows (Vajda et al. 2006): 


Ole) ~= $ Ai (Api)? (12) 


i=1 


By ordering the eigenvalues, the PCs corresponding to the largest eigenvalues deter- 
mine the influential part of the mechanism. 

PCA can also be implemented within the context of a neural network using autoen- 
coders. Figure 6 shows the architecture of an autoencoder with an input and an output 
layer and 3 hidden layers. The hidden layers are implemented with a decreasing num- 
ber of neurons to a bottleneck layer, then an increasing number of neurons to the 
output. The dimensional of the output is identical to the input and the values of its 
neurons is designed to reproduce the corresponding values at the input layer. There- 
fore, the goal of an autoencoder is to reproduce the original data (at the input) by 
representing the data through a reduced dimension corresponding to the number of 
neurons in that hidden layer. 

An autoencoder with one hidden layer, the bottleneck layer, a linear activation 
function and a penalty function that is the mean squared error (MSE) is designed 
to reproduce the PCA space from a prescribed input dimension to a dimension that 
corresponds to the number of neurons in the hidden layer. Additional steps are needed 
to reproduce the PCs from PCA analysis given that PCA also requires an orthonormal 
set of eigevenvectors for the PCs. 
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Fig. 6 Illustration of the network architecture of an autoencoder 


Recently, Zhang et al. (2021) proposed the use of autoencoders as a tool for 
chemistry reduction. These autoencoders exploit the dimensionality reduction at 
the bottleneck to construct a reduced description of chemistry. Given the inherent 
risk of extrapolation when the autoencoder attempts to access out-of-distribution 
(OOD) regions via extrapolation, Zhang et al. (2021) proposed the coupling of the 
autoencoder with either a deep ensemble (DE) method (Lakshminarayanan et al. 
2017) or the so-called PI3NN method (Zhang et al. 2021). Within an autoencoder 
structure, the DE method accounts for a predicted mean (the predicted values) as well 
as the output variance to assess uncertainty (Lakshminarayanan et al. 2017). While 
in the PIZNN method, two additional neural networks are introduced to estimate the 
upper and lower bounds of the data reconstruction, again as a measure to assess the 
uncertainty in the autoencoder performance. 

Figure 7 illustrates the two OOD-aware autoencoder configurations investigated 
by Zhang et al. (2021). The authors showed that by using these configurations, the 
number of input species is reduced from 12 to 2 at the bottleneck. This reduction can 
translate into a reduction in the number of transported scalars. 

Finally, another implementation of PCA in combustion chemistry has been pro- 
posed by D’ Alessio et al. (2020a, b). In their recent studies, they proposed an adap- 
tive reduced chemistry scheme in which the composition space is partitioned into 
different clusters where appropriate and efficient reduced chemistry models can be 
implemented. The partitioning is implemented, instead of using a standard clustering 
approach such as K-Means or SOM, using local PCA (or LPCA) (Kambhatla and 
Leen 1997). The main difference between the use of LPCA vs K-Means, for example, 
is in the criteria established to partition the composition space. Instead of minimizing 
the Euclidean error between data of a given cluster and its centroid, the criteria is to 
miminize the reconstruction error of the PCA within a given cluster. D’ Alessio et al. 
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Fig. 7 Illustration of two OOD-aware autoencoder architectures with DE (left) and PI3NN (right). 
The input layer, x corresponds to the full chemistry description; while the bottleneck z represents 
the reduced chemical description. The autoencoder is designed to reproduce the input in the output; 
and the DE and PI3NN modifications attempt to assess the uncertainty of the predictions, especially 
when extrapolation is needed. Reproduced with permission from Zhang et al. (2021) 


(2020b) showed that superior performance is established by adopting LPCA as part 
of the clustering algorithm instead of a hybrid clustering approach based on the cou- 
pling of self-organizing maps (SOMs) and K-Means in an unsteady laminar co-flow 
diffusion flame of methane in air. Within the context of a CFD simulation, LPCA is 
used as a classifier to determine the cluster to which a given cell state belongs. In 
each cluster, an a priori chemistry reduction is implemented using the training data, 
which in the studies of D’ Alessio et al. (2020a, b) correspond to a series of unsteady 
1D flames or data from 2D simulations of the same configuration, respectively. 


3.4 Hybrid Chemistry Models and Implementation of ML 
Tools 


The oxidation chemistry of a typical transportation fuel poses severe computational 
challenges for multi-dimensional reacting flow simulations. These challenges may 
be attributed primarily to the sheer size of associated chemical mechanisms when 
available. However, and oftentimes, the chemical kinetic data may not be available. 
While chemistry reduction strategies have been reasonably successful in overcoming 
the challenge of handling chemical complexity (Battin-Leclerc 2008; Turanyi and 
Tomlin 2014), such strategies can only be used when reliable detailed mechanisms 
for the fuels of interest are available. 

Experimental data-based chemistry reduction is one viable strategy for modeling 
the chemistry of complex fuels. Recently, the hybrid chemistry (HyChem) approach 
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was proposed by Wang and co-workers Wang et al. (2018), Xu et al. (2018), Tao 
et al. (2018), Wang et al. (2018), Saggese et al. (2020), Xu et al. (2020), Xu and 
Wang (2021) as a chemistry reduction approach for the high-temperature oxidation 
of transportation fuels starting from time-series measurements of fuel fragments (and 
other relevant species) to capture the pyrolysis stage of these fuels. Such measure- 
ments can be achieved primarily using shock tubes and a variety of optical diagnostic 
techniques and sampling methods. 

The approach is based on the premise that, at high temperatures, fuel oxidation 
undergoes: (1) a fast fuel pyrolysis step resulting in the formation of smaller fuel 
fragments, followed by (2) a longer oxidation step for these fragments. Figure 8 
shows experimental observations by Davidson et al. (2011), which illustrate the 2 
stages of n-dodecane oxidation through time-history measurements of the fuel, a fuel 
fragment, C2H4, and oxidation species, OH, H2O and CO3. The figure shows that 
the fuel is depleted in the first 301s and it is replaced by pyrolysis fragments, which 
eventually oxidize towards simpler hydrocarbons. 

In HyChem, a hybrid chemistry model represented by a set of lumped fuel pyroly- 
sis steps is augmented by foundational Co—C4 chemistry for the fragments-oxidation. 
With experimental measurements of the key fragments, the stoichiometric coeffi- 
cients and rate constants for the global reactions are determined through an opti- 
mization approach. The lumped reactions for the fuel pyrolysis is modeled using the 
following two reaction steps for a fuel C,, H,: 


e Unimolecular decomposition reaction 


CmHn ea (C2H4 + A3 C3H6 + Ag C4Hg) 


13 
+ ba [xCoeHe +  — x)] +a H+ (2—a) CH3 a 
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e H-atom abstraction and {-scission reactions of fuel radicals 


CnHn + R —> RH + y CH4 + ea (C2H4 + A3 C3H6 + Aq C4Hg) 


+ ba [xXC6H6 + (1 — x)] +8 H+ (1 — B) CH3 oo 
where R represents the following species: H, CH3, O, OH, O, and HO). In these 
reactions, œ, Ê, A3, A4 and x are the stoichiometric parameters that need to be deter- 
mined for each fuel chemistry. More specifically, œ and 6 correspond to the number 
of H atoms per C,,,H,, in the two reactions, respectively. The remaining parameters, 
ed, €a, ba and ba can be expressed in terms of the stoichiometric parameters using 
elemental conservation principles across each reaction (Wang et al. 2018). 

The HyChem approach relies on the ability to measure some key fuel fragments, 
CHy,, C>Hy (in shock tubes), C3H6, C4Hg isomers, CeH¢ and C7Hs (in flow reactors). 
Therefore, these fuel fragments represent much less complex species than the original 
fuel and their oxidation can be modeled using a simpler foundational chemistry model 
as the subsequent oxidation stage. More importantly, the fragments’ measurements 
can be used to determine the stoichiometric parameters and the rate constants of the 
lumped reactions needed to model the pyrolysis stage. 

Hybrid chemistry approaches, such as the HyChem ML can play useful roles 
to formulate robust chemistry descriptions for complex fuels. In two recent stud- 
ies, Ranade and Echekki (2019a,b) proposed an ANN-based implementation of 
HyChem. In a first step, a shallow regression ANN is implemented on the tem- 
poral species measurements to evaluate directly their rate of change, which directly 
measures their rate of reaction. In the second step, deep regression ANNs are trained 
to relate fragments’ concentrations to their rate of reaction. This network, as in the 
HyChem approach, is used to evaluate the fragments’ chemical source terms during 
the pyrolysis stage. Ranade and Echekki (2019b) showed that the procedure can 
be extended beyond the pyrolysis stage to enable the use of a simpler foundational 
chemistry. 

More recently, Echekki and Alqahtani (2021) proposed a data-based hybrid chem- 
istry approach to accelerate chemistry integration during the high-temperature oxida- 
tion of complex fuels. The approach is based on the ANN regression of representative 
species, which may or may not include the pyrolysis fragments, during the pyrolysis 
stage. These representative Co—C, species are determined using reactor simulation 
data and PCA on all species reaction rates. This PCA is used to determine the most 
important species to represent the evolution of the oxidation process. Beyond the 
pyrolysis stage, these species can be modeled with a foundational chemistry model 
like the remaining species. 

Since the representative species are not tied to a particular list of fragments, 
the approach can be extended to the modeling of low-temperature oxidation where 
some of the initial intermediates are fuel-dependent. The work of Alqahtani (2020) 
demonstrated the feasibility of this extension to low-temperature fuel oxidation. 

The approaches implemented in Ranade and Echekki (2019a,b), Echekki and 
Algahtani (2021) or Alqahtani (2020) rely on ANN for the regression of the fragments 
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or representative species in terms of the species concentrations. These studies suggest 
that the associated architectures of the ANN can be further simplified by using a subset 
of these species as inputs. This choice is motivated by the inherent correlations of 
the fragments/representative species and rely on the same motivation for using PCA 
in combustion modeling. However, ANNs may have limited interpretability unless 
they are implemented in the context of CRNN, as presented in Sect. 3.2. 

The CRNNs (Ji and Deng 2021) offer an alternative optimization of the global 
reactions of the pyrolysis stage using the law of mass action and the Arrhenius 
form for the rate constants. Zanders et al. (2021) implemented a stochastic gradient 
descent (SGD) approach to optimize the lumped global reactions of pyrolysis starting 
with data of ignition delay times. Their approach was implemented within their 
Arrhenius.jl open-source software (Ji and Deng 2021) and by implementing the 
lumped reaction steps of pyrolysis within a CRNN. Their evaluation of the rate 
parameters of the lumped pyrolysis reactions yielded both an enhanced computational 
efficiency compared to approaches based on genetic algorithms and an improved 
predictions of IDT for ranges of temperature and equivalence ratios. 


3.5 Extending Functional Groups for Kinetics Modeling 


Functional group information has recently been used for the bottom-up develop- 
ment of chemical kinetic models. This approach was developed following the initial 
insight that AI models can predict combustion properties from several key functional 
group features of a fuel mixture. Recently, the team led by Zhang et al. advanced 
lumped fuel chemistry modeling approach using functional groups for mechanism 
development (FGMech) (Zhang and Sarathy 202 1b; Zhang et al. 2021). They created 
a functional group-based approach, which can account for mixture variability and 
predict stoichiometric parameters of chemical reactions without the need for any 
tuning against experiments on the real fuel. 

Figure 9 presents an overview of the functional group approach for kinetic model 
development. The effects of functional groups on the stoichiometric parameters 
and/or yields of key pyrolysis products were identified and quantified based on pre- 
vious modeling of pure components (Zhang and Sarathy 2021a; Zhang et al. 2022; 
Zhang and Sarathy 2021c). A quantitative structure-yield relationship was developed 
by a multiple linear regression (MLR) model, which was used to predict the stoichio- 
metric parameters and/or yields of key pyrolysis products based on ten input features 
(eight functional groups, molecular weight, and branching index). The approach was 
then extended to predict thermodynamic data, lumped reaction rate parameters and 
transport data based on the functional-group characterization of real fuels. FGMech 
is fundamentally different in that no parameters need to be tuned to match actual 
real-fuel pyrolysis/oxidation data, and all the model parameters were derived only 
from functional group data. It was shown that the FGMech approach can make good 
predictions on the reactivity of various aviation, gasoline, and diesel fuels (Zhang 
and Sarathy 2021b; Zhang et al. 2021). 
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Fig. 9 Overview of the functional group approach for kinetic model development 


3.6 Fuel Properties’ Prediction Using ML 


The properties of fuels are carefully controlled to enable engines to operate at their 
optimal conditions and to ensure that fuels can be safely handled and stored. Important 
properties include those that can be easily determined based on simple thermo- 
physical models and linear blending (e.g., density, viscosity, heating values) to more 
complex properties that cannot be easily determined from physical modeling (e.g., 
octane number, cetane number, and sooting tendency). For the latter, ML techniques 
may be used to predict these fuel properties. 

The first requirement for fuel property prediction is a suitable input descriptor for 
model training. Various molecular 1-3D representations such as SMILES (Simplified 
Molecular Input Line Entry Specification), InChI (International Chemical Identifier) 

or connectivity matrices can be used to obtain molecular descriptors for Al-based 
quantitative structure-property relationships (QSPR). Table | illustrates the use of 
different ML approaches to evaluate fuel properties. 

Abdul Jameel et al. have demonstrated significant progress in the use of ANNs to 
predict various fuel properties including octane numbers (Jameel et al. 2018), derived 
cetane number (Jameel et al. 2016, 2021), flash point (Aljaman et al. 2022), and 
sooting indices (Jameel 2021). In general, they used functional groups derived from 
IH NMR spectra of pure hydrocarbons and real fuel mixtures as input descriptors 
for model training, as illustrated in Fig. 10. The functional groups used include nine 
structural descriptors (paraffinic primary to tertiary carbons, olefinic, naphthenic, 
aromatic and ethanolic OH groups, molecular weight and branching index). Ibrahim 
and Farooq (2020, 2021) utilized the methodology proposed by Abdul Jameel et al. 
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Table 1 Example of fuel properties predicted by AI and associated descriptors 


Species property ML approach Reported descriptors 
Octane/Cetane Number, ANN based group Molecular weight, critical 
autoignition metrics contributions, SVM based on __| volume, 
Boruta features elimination, Balaban/Kier-Hall/Wiener 
CNN, Graph NN, ANN, k-NN, | index, water/octanol 
RF, HDMR/CNN partitioning 
HHV, LHV, HoV GC, SVM, ANN, Ant Van der Waals surface area, 
colony—PLS-MLR, MLR, number of carbons 
GA-SVM 
Soot index, exhaust Bayesian inference of GC, LUMO-HOMO energy, 
after-treatment activity PCA-ANN, SVM, RF, PLS functional groups 
Transform 
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Fig. 10 Conversion of NMR spectra to functional groups followed by training for ML model for 
property prediction 


for fuel property (RON, MON, DCN, H/C ratio) prediction based on infrared (IR) 
absorption spectra rather than NMR shifts. 


3.7 Transfer Learning for Reaction Chemistry 


Chemical kinetic modelling is an indispensable tool for our understanding of the 
formation and composition of complex mixtures. These models are routinely used to 
study pollution, air quality, and combustion systems. Recommendations from kinetic 
models often help shape and guide environmental policies and future research direc- 
tions. There are two essential data feeds for such models: species thermochemistry 
and rate coefficients of elementary reactions. Uncertainties in these feeds directly 
affect the predictive accuracy of chemical kinetic models. Historically, these data 
were measured experimentally and/or estimated from simple rules, such as group- 
additivity and structure-activity-relations. Ab-initio quantum chemistry based theo- 
retical models have been developed over the years to calculate thermochemistry and 
reaction rate coefficients, and the accuracies of these calculations have been increas- 
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ing steadily. These methods, however, require significant computational power and 
are challenging to apply to large molecular systems. In recent times, machine- 
learning based methods have attracted significant attention for the prediction of 
thermochemistry and reaction rate coefficients. In particular, inspired by the suc- 
cess of transfer learning approach in image processing, researchers have applied it 
in the domain of reaction chemistry. Transfer learning applies the knowledge (or 
model) learned in one task to another task. One of the benefits of transfer learning 
is that it can overcome the lack of large datasets, which are generally needed for 
machine learning algorithms. 

Grambow et al. (2019) trained three base models, one each for enthalpy of for- 
mation, entropy and heat capacity, on a large dataset (~130,000) generated from 
low-level (high uncertainty) theoretical calculations. These based models were then 
used as the starting models for the prediction of more accurate values of those ther- 
mochemistry properties by using a much smaller (<10,000) dataset of experimental 
values and high-accuracy theoretical calculations (see Fig. 11). Bhattacharjee and 
Vlachos (2020) implemented a ‘data fusion’ methodology to map thermo-chemical 
quantities, calculated at various levels of theory, to a higher level of theory. Zhong 
et al. (2022) overcame the challenge of small datasets by transferring knowledge 
among them for predictions with higher accuracy (see Fig. 12). The authors also 
compared their results with two other similar approaches, namely multitask learn- 
ing and image-based transfer learning. Likewise, Han and Choi (2021) presented a 
framework of leveraging the learning from a large simulated database (with high 
uncertainty) to a small experimental database (with small uncertainty) for reliably 
predicting NMR (nuclear magnetic resonance) chemical shifts over a wide range of 
chemical space. 
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Fig. 12 Transfer learning approach for combining small datasets. Reproduced with permission 
from Zhong et al. (2022) 


More recently, Ibrahim and Farooq (2022) showcased a temperature-dependent 
multi-target model with a custom-made Arrhenius loss applied to the AtmVOCkin 
reaction rate dataset. The Arrhenius loss dictates physically sound temperature- 
dependence which reduces overfitting, makes use of all available data in literature, 
and it outputs the three Arrhenius parameters which are compatible with modern 
automated chemical mechanism generator inputs. The graph-based D-MPNN was 
used for transfer learning from the publicly available QM9 dataset which stretches 
the applicability domain and supplements fixed molecular descriptors. Multi-target 
predictions were also implemented to enable cross-reaction learning which can 
enhance predictive capability for reactions with small datasets. Tuning was done 
using Bayesian optimization which gives robust/automatic predictions and a fair 
comparison among various models. The model was used to predict the three modified- 
Arrhenius parameters for the temperature-dependent reactions of OH, O3, NO3 and 
Cl with a wide range of hydrocarbons (see Fig. 13). 


4 Chemistry Integration and Acceleration 


Chemistry integration represents a true bottleneck in combustion simulations involv- 
ing both transport and chemistry. Measures to accelerate chemistry have adopted 
different strategies that are often combined with an initial step of chemistry reduc- 
tion to global or skeletal mechanisms. Such strategies include chemistry tabulation, 
such as the use of in situ adaptive tabulation (ISAT) (Pope 1997), regression (such as 
ANN-based regression discussed in Sect. 2) and the piecewise reusable implementa- 
tion of solution mapping (PRISM) (Tonse et al. 2003), adaptive chemistry, including 
dynamic approaches (see for example Liang et al. (2009), Continuo et al. (2011), Sun 
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Fig. 13 Reaction rate prediction scheme (with toluene shown as a representative molecule). (Cour- 
tesey of Ibrahim and Farooq (2022)) 


and Ju (2017) and D’ Alessio et al. (2020a)), manifold-based methods, such as intrin- 
sic low-dimensional manifolds (ILDM) (Maas and Pope 1992) and computational 
singular perturbation (CSP) methods (Lam and Goussis 1994). Chemistry accelera- 
tion primarily relies on operator splitting of the chemical source terms resulting in 
the solution for ordinary differential equations (ODEs). 

In the last few years, there has been a growing excitement about the potential of 
neural ODE (NODE) solutions (Chen et al. 2018; Rackauckas et al. 2020). NODEs 
construct solutions for ODEs using neural networks and ODE solvers where model 
parameters (i.e. weights) are evaluated by a backward solution of the adjoint state. 
Implementing NODEs for combustion reaction presents numerous challenges associ- 
ated with the inherent stiffness of the ODEs and the requirement for the simultaneous 
solutions of multiple ODEs for species and energy (Kim et al. 2021). 

However, there have been several attempts in recent years to implement chem- 
istry integration with neural networks. Owoyele and Pal (2022) proposed the so- 
called ChemNODE approach. The implementation of ChemNODE is summarized 
in Fig. 14. In ChemNODE, a stiff ODE solver is used to advance the solution of a 
thermo-chemical state at different time increments. These solutions constitute the 
observations that are used to train for the reaction rates implemented on the right 
column of the figure. These ANN-based reaction rates are integrated as well using the 
same ODE solver. The loss function to be minimized is the mean squared error com- 
paring the solutions at the various observation points based on integration with the 
Arrhenius law and integration with the ANN-based reaction rates. Recognizing the 
difficulty of learning chemical sources within the proposed ChemNODE approach, 
Owoyele and Pal (2022) used a progressive approach for training these terms where 
each species is trained sequentially while the remaining species’ source terms are 
modeled with the solution from the ODE solver based on the Arrhenius law. More- 
over, the optimization process involves the evaluation of derivatives of the neural 
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Fig. 14 Illustration of the ChemNODE algorithm. Reproduced with permission from Owoyele and 
Pal (2022) 


network solution with respect to the network parameters, Owoyele and Pal (2022) 
adopted a forward-mode continuous sensitivity analysis using packages available for 
the Julia language. 

An alternative procedure for accelerating chemistry integration is proposed by 
Galassi et al. (2022). Their acceleration strategy is built on the use of CSP to 
remove the fast time scales from the chemistry integration. CSP usually requires 
the evaluation of a Jacobian matrix for the local chemical source terms and its 
eigen-decomposition. This decomposition is needed to identify the fast and slow 
timescales of the chemical system. By projecting the fast time scales out of the 
chemistry integration, the inherent stiffness of this chemical system is significantly 
reduced. However, there is an inherent cost to the evaluation of the Jacobian and 
the process of its eigen-decomposition, which scales strongly with the size of the 
chemical mechanism. Galassi et al. (2022) proposed the use of ANN regression as a 
cheaper surrogate to the local projection basis. Otherwise the CSP procedure shown 
in Fig. 15 is adopted. Figure 15 shows the general algorithm used to integrate chem- 
istry within the proposed CSP-ANN framework. Given a current chemical state, the 
CSP basis is retrieved using ANN. The training for this basis is implemented offline, 
which was carried out in the Galassi et al. (2022) study using OD ignition data for 
hydrogen-air mixtures. The procedure, then, involves an implementation of “radi- 
cal correction” to account for the fast time scales, an explicit integration using the 
projection into the slow invariant manifold, then another radical correction. For the 
9-species mechanism, 7 neural networks are trained in the Galassi et al. (2022) study. 
They each feature 2 hidden layers with 128 neurons each in each layer. 
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Fig. 15 Illustration of the CSP-ANN algorithm. Reproduced with permission from Galassi et al. 
(2022) 


Zhang et al. (2021) proposed a different scheme for chemistry integration, which 
is based on training a deep neural network (DNN) to project a solution of the thermo- 
chemical state vector at a given time (i.e. the input) to the corresponding solution after 
a small time increment (i.e. the output). Figure 16 illustrates the structure of the DNN, 
which was implemented for a dimethyl ether (DME) mechanism with 54 species. 
The input solution at a given time includes 56 neurons for the species, pressure and 
temperature. The DNN features two independent, fully-connected branches for the 
low- and high-temperature oxidation for DME. Each branch has 3 hidden layers with 
1600, 400 and 400 neurons. The output corresponds to the projection of the solution 
at a later time with 56 neurons in the output layer. The approach adopted by Zhang 
et al. (2021) is very reminiscent of the ISAT approach (Pope 1997), except for relying 
on DNNs to project solutions instead of a tree-based storage and tabulation. 
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Fig. 16 Illustration of the deep neural network for DLODE. Reproduced with permission 
from Zhang et al. (2021) 


5 Conclusions 


In this chapter, we have illustrated a number of applications of ML tools in combustion 
chemistry. These applications span the scopes of understanding, the reduction and 
the acceleration of chemistry in combustion applications. Based on the material 
presented, we anticipate important advances in the following areas: 


e The development of experiment-based HyChem-style mechanisms for a broad 
range of fuels that also rely on rules extracted for fuels of the same functional 
group. 

e The implementation of novel ML tools for chemistry reduction either by devel- 
oping skeletal mechanisms, global mechanisms or hybrid chemistry models com- 
bining empirical global steps coupled with foundational chemistry. 

e The development of chemistry acceleration schemes that exploit dimensionality 
reduction of the composition space and features of stiffness removal. 
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Deep Convolutional Neural Networks R) 
for Subgrid-Scale Flame Wrinkling giecik 
Modeling 


V. Xing and C. J. Lapeyre 


Abstract Subgrid-scale flame wrinkling is a key unclosed quantity for premixed 
turbulent combustion models in large eddy simulations. Due to the geometrical and 
multi-scale nature of flame wrinkling, convolutional neural networks are good can- 
didates for data-driven modeling of flame wrinkling. This chapter presents how a 
deep convolutional neural network called a U-Net is trained to predict the total flame 
surface density from the resolved progress variable. Supervised training is performed 
on a database of filtered and downsampled direct numerical simulation fields. In an 
a priori evaluation on a slot burner configuration, the network outperforms classical 
dynamic models. In closing, challenges regarding the ability of deep convolutional 
networks to generalize to unseen configurations and their practical deployment with 
fluid solvers are discussed. 


1 Introduction 


As the effects of human activities become increasingly visible on the planet’s climate, 
the combustion of fossil fuels is in need of renewal. Many ambitious carbon reduction 
scenarios, e.g. the IEA’s “Net Zero by 2050” (International Energy Agency 2021), 
suggest a growing reliance on non-carbon fuels such as hydrogen and ammonia in 
the next decade. The large expected increase in intermittent renewable power notably 
solar and wind is well complemented by these means of storing, transporting, and 
distributing energy. While some applications will require fuel cells, it seems that 
combustion still has a large role to play in consuming these energy sources whether 
via adapted gas turbines for power generation, in heaters for homes and offices, in 
engines for propulsion, and even in some industrial processes such as iron or glass 
production. Additionally, the manipulation, storage and transport of these fuels can 
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lead to various safety issues that must be assessed and accounted for in the design 
phases. This is particularly true for hydrogen, which is hard to contain, hard to keep 
in a liquid phase, and has a low flammability limit, meaning leaks can easily arise and 
lead to unwanted fires and explosions. Overall, many new design problems might 
arise for turbulent combustion systems in this upcoming energy transition. 

The relentless increase in computational power enables the use of large eddy 
simulations (LES) to capture fine, unsteady combustion phenomena in ever more 
complex premixed combustion configurations (Vermorel et al. 2017; Carlos et al. 
2021a,b). The main challenge lies in the separation of scales between the finest 
combustion structures—typically of the order of the laminar flame thickness—and 
the extent of the computational domain. This is exacerbated in the aforementioned 
example of hydrogen which burns at higher speeds and in thinner reaction zones 
than hydrocarbon fuels. As a result, one of the major challenges in LES of premixed 
turbulent combustion is the modeling of subgrid-scale (SGS) reaction source terms. 
Turbulent reaction source terms are highly dependent on unresolved interactions 
between fine turbulent scales and the flame front. To first order, this results in the 
increase of the total flame surface via wrinkling of the flame front at resolved and 
unresolved scales, leading to an increased consumption rate of the unburnt gases. 
Inspired by this observation, many premixed turbulent combustion models have been 
built under the flamelet assumption, where the reaction rate is proportional to the 
flame surface area (Poinsot and Veynante 2011). As a result, correctly capturing 
the turbulent combustion rate is contingent on accurate modeling of SGS flame 
wrinkling. 

This chapter will begin in Sect. 2 with an overview of existing SGS wrinkling mod- 
els, with a specific focus on algebraic fractal approaches. The success of dynamic 
approaches (Charlette et al. 2002b; Ronnie et al. 2004) suggests that the inclusion 
of contextual data leads to significant improvements in model accuracy. In this light, 
a promising opportunity for wrinkling modeling is to use convolutional neural net- 
works, which have been at the forefront of recent major advances in computer vision 
and are presented in Sect.3. The full supervised training and a priori evaluation 
of a deep convolutional neural network wrinkling model is presented in Sect. 4. 
Finally, issues that need to be addressed on the path towards the deployment of neu- 
ral network-based wrinkling models in practical LES computations are discussed in 
Sect. 5. 


2 Wrinkling Models 


Turbulent fully premixed flames are commonly modeled using the flamelet assump- 
tion, under which chemical reactions take place in thin layers that are wrinkled but 
not fragmented by turbulence (Peters 1988). Chemical timescales are assumed to be 
fast compared to turbulent processes so that the effects of turbulence can be treated 
independently from the chemistry. Under these assumptions, the evolution of ther- 
mochemical variables can be tracked by a single scalar quantity, the progress variable 


Deep Convolutional Neural Networks for Subgrid-Scale Flame Wrinkling Modeling 151 


c, which increases monotonically from 0 in the unburnt state to 1 in the burnt state. 
Flamelet models often assume that the structure of local flame elements measured 
in the progress variable space is identical to that of a one-dimensional laminar flame 
propagating in the normal direction to the flame element, making tabulated chemistry 
an effective method to model the thermochemical state of the flamelet (Benoit 2015). 
Traditional turbulent combustion diagrams (Borghi 1985; Peters 1988, 1999) posit 
that flamelets exist as long as the Kolmogorov lengthscale is larger than the laminar 
flame thickness, ôz, and turbulent eddies cannot penetrate inside the flame front. 
This limitation is challenged by a growing body of work (Skiba et al. 2018; Driscoll 
et al. 2020) that reports experimental and numerical evidence of the existence of 
flamelet structures even for highly turbulent premixed flames (turbulent Reynolds 
number Re, ~ 10°, Karlovitz number Ka ~ 500) and supports the validity of flamelet 
models for a much wider range of turbulent flames than previously assumed. 

Under the flamelet assumption, the wrinkling of the reaction layer induced by 
turbulence leads to an increase of the turbulent flame speed s7 proportional to the 
total flame area Ar (Driscoll 2008): 


—=h—_, d) 


where sz, Io, Az are the unstretched laminar flame speed, stretch factor, and unwrin- 
kled flame area, respectively. J) accounts for the effect of differential diffusion, and 
although accurate modeling of this factor is still elusive, experimental and DNS 
measurements consistently report Jọ values close to unity even for highly turbulent 
flames (Driscoll et al. 2020). The main obstacle to determining the turbulent flame 
speed is therefore the evaluation of the wrinkled flame front surface area. Since LES 
of practical turbulent premixed flames typically cannot afford to resolve the smallest 
wrinkling scales, the unresolved flame area must be recovered by SGS models. 
Following Boger et al. (1998), the transport equation for c is given by: 
apc jas 


py + V Bile) + V - (puc — puc) = pw|Vc| = (pw)s|Vel, (2) 


where p, u, w are the density, velocity vector, and flamelet displacement speed, and Q, 
O = pQ/Q, (Q), denote filtered, density-weighted filtered, and surface-averaged 
versions of a quantity Q, respectively. For laminar flame elements that propagate 
at the laminar flame speed sz (Io © 1), the first term of the right hand side can be 
simplified as (ow); = pusz using the unburnt gas density p„. The second term of 
the right hand side is the generalized flame surface density (FSD) noted © = [Vc] 
and represents the total surface area per unit volume of the flame front, including 
unresolved wrinkles. © is often connected to the resolved FSD |Vé| through the 
wrinkling factor: 


2 = D/|Vel. (3) 
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& is equal to one when flame wrinkling is fully resolved, like in the case of a laminar 
flame. 

Equation 2 forms the basis of flame surface density models, which typically deter- 
mine = or & using a transport equation (Weller et al. 1998; Hawkes and Cant 2000; 
Richard et al. 2007) or algebraic models (Boger et al. 1998; Wang et al. 2012; Mouri- 
aux et al. 2017). For instance, Boger et al. (1998) propose an algebraic expression 
for © in the limit of a thin flame front relative to the filter size A: 

6 _c(1—¢) 


E=4/—E 


ud A? 


(4) 


where & remains to be modeled. 

The wrinkling factor is also an essential component of LES reaction rate closures 
that use filtering or artificial thickening to deal with insufficient flame resolution. In 
the F-TACLES formalism (Fiorina et al. 2010), unclosed terms are pre-computed 
on filtered 1D laminar flames and tabulated as a function of č and A. The turbulent 
reaction rate is expressed as © = EQp. Alternatively, the thickened flame model 
(TFLES) (Butler and O’Rourke 1977; Colin et al. 2000) artificially thickens the 
flame front by a factor F by multiplying the thermal diffusivity and dividing the 
reaction rate by F. This operation does not affect the flame speed and enables the 
computation of the reaction rate from a set of well-resolved thermochemical variables 
$. An efficiency factor E compensates the reduced sensitivity of the thickened flame 
front to turbulent wrinkling: 


p= Oa (5) 
Ogee 


where & (8?) and &(F 50) are the wrinkling factors associated with the unthickened 
and thickened flame, respectively. 

The rest of this chapter will focus on algebraic models for & which have seen 
extensive developments over the years and have been comparatively reviewed in the 
literature (Chakraborty and Klein 2008; Ma et al. 2013). They are divided into two 
families: 


e Models based on correlations of the turbulent flame speed (Weller et al. 1998; 
Colin et al. 2000; Muppala et al. 2005). These models leverage Eq. | to express 
© as a function of turbulence parameters such as u'/sz, /;/5,. For instance, Colin 
et al. (2000) propose the expression: 


A 


= (6) 


SL 


u 


E=1+aly, 


where I",, accounts for the net straining effect of all vortices smaller than A., and 
a is a model parameter prescribed by the user. 
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Models based on a fractal description of the flame front (Gouldin 1987; Gouldin 
et al. 1989; Charlette et al. 2002a, b; Ronnie et al. 2004; Fureby 2005; Wang et al. 
2011; Hawkes et al. 2012; Keppeler et al. 2014). These will be detailed in the 
following. 


Building from the seminal work of Gouldin (1987); Gouldin et al. (1989), fractal 
models assume that in a range of physical scales bounded by an inner cutoff n and 
an outer cutoff L, the flame front is a fractal surface of dimension D such that 
2 < D <3. As a result, the wrinkling factor is given by: 


L D-2 
aZ (=) (7) 
n 


Theoretical scaling arguments based on Damköhler’s small and large-scale lim- 
its (Peters 2000) indicate that D ranges from 7/3 in flamelets to 8/3 in high 
Karlovitz flames (Hawkes et al. 2012). Experimental measurements lean towards 
the lower end of this range, with recent results on highly turbulent flames report- 
ing 2.1 < D < 2.3 (Skiba et al. 2021a). L corresponds to the size of the largest 
unresolved wrinkles, which is roughly the turbulence integral lengthscale l, in 
RANS (Gouldin 1987) and the combustion filter size A in LES (Knikker et al. 2002; 
Charlette et al. 2002b). 7 is the size of the smallest wrinkles which scales with the 
inverse of Ka (Giilder and Smallwood 1995; Skiba et al. 2021a) and is the subject of 
careful modeling endeavors in fractal models. 

In Charlette et al. (2002a), the inner cutoff scale 7 is chosen as the inverse mean 
curvature of the flame |(V -n),| with n the normal vector to the flame front. It is 
modeled by assuming an equilibrium of the production and destruction of SGS flame 
surface density, and lower bounded by the laminar flame thickness. The resulting 
model is expressed as Wang et al. (2011): 


TA uly p 
E = | 1 +min| — — 1, Ta — F (8) 
ôL SL 


where I, is a vortex efficiency function that serves the same purpose as in the Colin 
model of Eq. 6. While the Colin model introduced a multiplicative model parameter 
a, the Charlette model uses a power-law exponent $ which is linked to the fractal 
dimension by 8 = D — 2. A constant value 6 = 0.5 (D = 2.5) is proposed in the 
original paper and leads to a static version of the Charlette model. When u’, is 
sufficiently large, Eq. 8 takes on a saturated form: 


(8 > 


where the wrinkling does not depend on the turbulence intensity. 
The power-law parameter 6 can also be determined by a dynamic procedure 
(Charlette et al. 2002b) where it becomes a spatially and temporally evolving quan- 
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tity. This avoids the delicate and arbitrary choice of one single value for 6, which 
is often only justified post hoc by comparison to DNS or experimental data. It is 
also supported by empirical evidence highlighting significant spatial and temporal 
variations of the fractal dimension in turbulent flames (Keppeler et al. 2014; Skiba 
et al. 2021a). 

The dynamic procedure introduces a filtering operation Ô at a test-filter size 
A= yA > A and an averaging operation (Q) over a size Am > Â. By equating 
two expressions of the averaged test-filtered total FSD: 


‘eas 


(S,lVel) = (S,lVel) , (10) 


and assuming that £ is uniform over the averaging volume, a closed-form formula 
for 6 can be found. The high levels of turbulence seen in practical turbulent configu- 
rations mean that Eq. 8 often takes its saturated form (Veynante and Moureau 2015) 
and in this case, the dynamic expression for £ is: 


p = malvehnaveny ai 
ny 


The dynamic Charlette model has been applied to LES of jet flames (Wang et al. 
2011; Schmitt et al. 2015; Volpiani et al. 2016), ignition kernels (Wang et al. 2012; 
Mouriaux et al. 2017), stratified non-swirling burners (Mercier et al. 2015; Proch et al. 
2017), the PRECCINSTA swirled burner (Veynante and Moureau 2015; Volpiani 
et al. 2017), explosions in semi-confined domains (Volpiani et al. 2017), and light- 
around in an annular combustor (Puggelli et al. 2021). It has also seen numerous 
incremental improvements over the years (Wang et al. 2011; Mouriaux et al. 2017; 
Proch et al. 2017) and stands today as a strong model for the SGS wrinkling factor. 


3 Convolutional Neural Networks 


This section gives a primer for uninitiated combustion physicists on deep learning. 
It explores what neural networks are, what the adjective “convolutional” refers to 
in that context, and how Convolutional Neural Networks, a workhorse of the deep 
learning revolution of the past decade, can be put to use for SGS problems. 


3.1 Artificial Neural Networks 


As early as the 1940s, attempts to model the behavior of biological neural networks 
have led to a simple function representing the action of a neuron (McCulloch and 
Pitts 1943). In its simplest form, a neuron sums all of its weighted electrical inputs 
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via its dendrites, and the result is fed to a threshold function: if the sum of the input 
signals is high enough, an electrical impulse is sent through the axon to other neurons. 
Formally: 

y =o(w'x+b), (12) 


where x is the vector of inputs received by the dendrites, w the vector of weights 
that it applies to each, b is a bias value, ø some threshold-like function called the 
activation function, and y the resulting signal sent via the axon to other connected 
neurons. Several of these neurons can be connected together, side by side as well 
as front to back, to form a neural network. Networks are part hand-designed, part 
automatically optimized, but in their most simple form they are feedforward, i.e. 
there are no information loops in the network. 

The understanding of neural biology has advanced well beyond these simple 
models today, but the terminology “neural” has persisted. Modern neural networks 
have moved away from a strict analogy with biological neurons, towards a more 
abstract formalism. A network is composed of a succession of layers that perform 
operations on their input feature map, and pass on the resulting output feature map 
to the next layer. 

Another important choice concerns the activation functions: if o is linear, then so 
is each neuron, and stacking several linear neurons successively would be equivalent 
to composing several linear functions. The result would still be a linear function that 
a single neuron can represent. o is therefore usually non-linear, and is an empirical 
trade-off between the non-linearity and the computational complexity it introduces, 
as well as some considerations on ease of training. The most common example is the 
ReLU or REctified Linear Unit function: o (x) = max(0, x). For binary classification 
tasks, the last activation function is usually a sigmoid function: 


1 
6) = WS (13) 


taking values from 0 to 1 that can be interpreted as a class probability. 

Once a network architecture is chosen, it is time to train it. Essentially, training 
means finding the optimal weights w and biases b, called trainable parameters, for 
all the neurons in the network so as to minimize a given loss function. To this end, 
the gradient of the loss function on given training samples with respect to all of the 
trainable parameters can be computed. This error can then be minimized by updating 
the trainable parameters via an optimization procedure, usually a form of iterative 
gradient descent. 

In practice however, this gradient often proves highly non-convex and high- 
dimensional, and the error minimization process is too challenging for many stan- 
dard gradient descent techniques. Instead, the minimization process is usually per- 
formed using backpropagation and stochastic gradient descent (SGD). Backprop- 
agation (Rumelhart et al. 1986) is simply the process of computing progressively 
the gradient of the error with respect to the trainable parameters in each layer of 
the neural network, working backwards (hence the name) from the output to the 
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input. This is a special case of reverse automatic differentiation, which is now the 
standard framework in deep learning libraries to efficiently perform backpropaga- 
tion on complex neural networks. SGD is another trick used by most deep learning 
strategies (Goodfellow et al. 2016). Ideally, the gradient of error with respect to train- 
able parameters should be estimated over the entire training set. However, training 
databases are very large in deep learning, and this is computationally intractable. 
But in many situations, approximating this gradient with a small subset (called a 
mini-batch) of the training database gives a sufficiently good estimate of the overall 
gradient to advance an iterative gradient descent algorithm. This mini-batch-based 
gradient descent is called SGD. 

Machine learning models are trained to capture all the meaningful features of 
the training dataset that are relevant to their learning task. If a model is under- 
parametrized, it can fail to fit the training dataset adequately, leading to a behavior 
named underfitting. For this reason, modern neural networks contain a very large 
number of parameters, more than hundreds of billions in recent architectures (Brown 
et al. 2020). This can however lead them to learn too much, eventually learning the 
full dataset entirely by heart, a process called overfitting. Although this results in a 
very low loss function during training, an overfitted network performs poorly on data 
outside of the training dataset, meaning that it fails to generalize. To guard against 
this, overfitting must be monitored during training. This is done by reserving part of 
the dataset as a separate validation set, which can never be used to optimize the net- 
works weights directly. The quality of predictions on this validation set is evaluated 
regularly during training, demonstrating when the generalization performance starts 
to degrade, and suggesting that the network has started to learn the specific noise of 
the data, and is no longer improving on the general task. The compromise between 
underfitting and overfitting is called the bias-variance trade-off (Goodfellow et al. 
2016) and is central to any machine learning task. 


3.2 Convolutional Layers 


Neural networks built only with fully connected (FC) layers, where each neuron is 
connected to every neuron of the previous layer are called multi-layer perceptrons 
(MLPs). MLPs are simple stacks of successive FC layers. While this gives some 
choice in the design of the network (number of dense layers, number of neurons in 
each layer, activation functions...), other more specialized layers have been proposed 
for specific tasks. For image data, where the pixels have a matrix structure, convolu- 
tional layers (ConvLayers) are usually used. For the purpose of physical modeling, 
it is believed that a direct analogy between pixels in images and discretized physical 
fields can be made. The output of a ConvLayer is obtained by the convolution of 
its kernel, containing its trainable parameters, with its input feature map, as illus- 
trated in Fig. 1. Multiple independent channels, each with its own kernel, are usually 
used to enhance the expressiveness of the layer. Each kernel (here of size 3 x 3, 
in gray) is convolved with the input matrix, producing a new matrix at the output. 
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Output pixels 


Input pixels 


Fig. 1 Convolutional layer on a 2D matrix (e.g. an image). Input pixels (bottom) are convolved 
with a 3 x 3 kernel to produce the output pixels one by one 


These convolutional kernels are the basis of many image treatment methods, where 
the kernel weights are prescribed to perform tasks such as contour detection, gaus- 
sian blur, denoising, etc. In a ConvLayer, the weights of the kernel (here 9 values) 
are the learnable parameters that are to be adjusted by the learning process instead 
of being explicitly prescribed. ConvLayers are well-adapted to dealing with spa- 
tial grids because of their translation equivariance and local consistency inductive 
bias (Battaglia et al. 2018). Since the same kernel is used for all input locations, the 
number of parameters of a ConvLayer is typically lower than in an FC layer. More- 
over, unlike an FC layer, the number of parameters in a ConvLayer does not depend 
on the size of the input feature map, making it a good choice to process inputs of 
large dimensions like 3D computational domains. 

Adding the ConvLayer to the layer arsenal leads to new network architectures, 
called convolutional neural networks (CNNs). Interestingly, shallow ConvLayers 
of a CNN have been observed to learn Gabor filters, which naturally occur in the 
visual cortex of mammals and are often chosen to extract image features in hand- 
made image classifiers (Goodfellow et al. 2016). CNNs have been applied with great 
success for image-based tasks since the 1990s (LeCun et al. 1998), and have fueled 
the deep learning craze since the early 2010s successes (Krizhevsky et al. 2012) on 
the ImageNet classification challenge (Deng et al. 2009). Empirical evidence has 
shown that stacking small convolutional kernels leads to better performance than a 
single equivalent large kernel (Simonyan and Zisserman 2015; Szegedy et al. 2015). 
Depth is thus an important hyperparameter in CNNs, and deep CNNs have been 
universally used in recent breakthroughs in computer vision (He et al. 2015; Brock 
et al. 2019; Tan and Quoc 2019; Chen et al. 2020). Two of the most common learning 
tasks in computer vision, specifically when dealing with images, are classification 
and segmentation. 

Image classification (Fig. 2a) is a task where a discrete label must be determined 
for an image. In the simple case of classifying of cat and dog images, the probability 
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Fig. 2. Typical CNN tasks: a classification, where an image is classified according to a discrete list 
of labels; and b segmentation, where each pixel is classified according to a discrete label 


d 
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that the image contains a cat Peat is predicted by the network, and Paog = 1 — Peat 1S 
inferred. If Peat > 0.5, the label for the image is determined to be cat. Otherwise, it 
is dog. This prediction can then be compared to a truth value in the training database, 
and the network weights can be updated as described in Sect.3.1. More generally, 
there can be more than 2 classes to choose from, and more than one class can be 
present at the same time. CNNs designed for classification tend to have a funnel-like 
shape, with a high-dimensional input (several thousand pixels, possibly in color) and 
a low-dimensional output (only 2 in our example, 1000 in the ImageNet dataset (Deng 
et al. 2009)). 

Image segmentation (Fig. 2b) consists in identifying and classifying meaningful 
instances in an image by outlining them with labeled masks. Continuing with the 
previous example, the precise pixels belonging to the cat are sought. This changes 
the architecture of the network, which no longer needs to reduce the dimension of its 
output. Instead, the output has the same shape as the input, and each pixel is classified 
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as cat (1) or not (0). As a result, the layers chosen in the network must ensure that 
the problem dimensionality is preserved at the output. 


3.3 From Segmentation to Predicting Physical Fields 
with CNNs 


A specific neural network architecture initiated a series of excellent results on image 
segmentation tasks: the so-called U-Net (Ronneberger et al. 2015). This network, 
introduced to detect tumors in medical images, can now be found in a variety of 
projects, in its original form or in one of numerous variations (Çiçek et al. 2016; 
Falk et al. 2019; Oktay et al. 2018), including in fluid dynamics (Wandel et al. 
2021). Its structure is that of a “double funnel”, one encoding the image into small 
but numerous feature maps, and the other upscaling back to the input dimension 
(Fig. 3). Compared to simple linear architectures (Fig. 2), the U-Net introduces skip 
connections between some of the blocks, meaning data flows both to the lower 
blocks (with deeper encoding of the features) and directly to the same-size output. 
The intuition behind this is that in order to perform a segmentation decision on a 
given pixel, a multi-scale analysis is needed. The influence of neighbouring pixels 
informs on local textures. Further pixels (equivalent to a “zoomed-out” view of the 
image) give information about the general shapes in the vicinity. Further pixels still 
(seen by the deepest levels of the U-Net) offer an analysis of the position of the 


> 


if 


Fig. 3 Architecture of a U-Net neural network. Convolutional layers operate in an “double fun- 
nel” fashion, first reducing the feature map size, than increasing it again to match the input. Skip 
connections are used between matching-size layers 
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shapes relative to each other. In the second (right in Fig. 3) half of the network, these 
levels of analysis coalesce gradually to form the final decision. 

This process has analogies with the dynamic procedure of Eq. 11. Indeed, the 
dynamic estimation of £ relies on observing the field of c at the resolved scale and 
the test-filter scale. Similarly, the first layer of a U-Net learns to detect structures on 
a 3-pixel wide stencil, and deeper layers aggregate features coming from several of 
these patches, effectively working at a larger scale. The U-Net can therefore be seen 
as a generalization of the concept introduced by dynamic models, where the effect 
of multiple scales on the target prediction is learned from the data, instead of only 
the resolved and test-filtered scales. This motivates the application of this type of 
network to the problem of predicting sub-grid scale wrinkling. 

Some adaptations are needed to use a traditional U-Net on LES fields: 


The U-Net performs a regression task by predicting specific SGS values instead 
of a segmentation task. The final activation function should thus be a ReLU or an 
identity function. 

e The U-Net must handle 3D data instead of 2D images. This poses very little 
challenge, as most modern implementations of neural network libraries natively 
offer 3D convolutional layers with the same functionality as classical 2D ones. 
Because the CNN is designed to work on structured data (pixels in image applica- 
tions), it must operate on a homogeneous, isotropic mesh. This might mean that the 
field from a CFD mesh must be interpolated onto such a mesh. This limitation is 
due to the use of a “vanilla” U-Net, with no adaptations to more complex meshes. 
However, modern implementations with graph neural networks (Pfaff et al. 2021) 
could perform operations directly on an unstructured mesh if needed. 


4 Training CNNs to Model Flame Wrinkling 


This section presents the complete process of training and evaluating the CNN as 
a wrinkling model by following the steps described in Lapeyre et al. (2019). Full 
details are contained in the original paper, and code and data are available online.! 


4.1 Data Preparation 


The training and evaluation datasets are generated from the DNS of a slot burner 
configuration simulated with the AVBP unstructured compressible code (Schonfeld 
and Rudgyard 1999; Selle et al. 2004). A fully premixed stoichiometric mixture 
of methane-air unburnt gases is injected in a central rectangular inlet section at 
U = 10 m/s and surrounded by a slow coflow of burnt gases. The domain is a rectan- 
gular box meshed with a homogeneous grid containing 512 x 256 x 256 hexahedral 


l https://gitlab.com/cerfacs/code-for-papers/2018/arXiv_1810.03691. 
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elements of size Ax = 0.1 mm which resolve the reaction zone of the flame front 
on 4—5 points. A turbulent velocity field generated from a Passot-Pouquet spec- 
trum (Passot and Pouquet 1987) is superimposed to the unburnt gas inlet velocities. 
Three separate DNS simulations are run: 


e DNS1: inlet turbulence fluctuation intensity u’ chosen such that u'/sz = 1.25, 

e DNS2: increased inlet turbulence, u’/s;, = 2.5, 

e DNS3: starting from a steady-state snapshot of DNS2, the inlet velocity U is 
doubled for 1 ms, then set back to its initial level for 2 ms. This triggers the 
formation of a detached pocket of unburnt gases as evidenced in Fig. 4. 


The training dataset is built from 50 snapshots of DNS1 and 50 snapshots of DNS2 
extracted at 0.2 ms intervals in the steady-state regime. Similarly, the evaluation 
dataset is made up of 15 snapshots of DNS3. The slightly different large-scale flow 
dynamics and flame front geometry make it a good choice to assess the generalization 
of the CNN on a distribution close to that of the training set. 

For each snapshot, the DNS field of c is filtered with a Gaussian kernel and 
downsampled to a coarse 64 x 32 x 32 grid with a coarse cell size 8Ax to generate 
é and © = |Vc]. The network is trained to predict y= Pay phe corresponding to 
an input field of c. >" is the total FSD normalized by its maximum value measured 
on a laminar flame discretized on the same grid. While the values of © are specific 
to a given flame and coarse grid, D isa generic quantity that reflects the amount of 
unresolved wrinkling and should be more amenable to generalization. Normalizing 
the target value around 1 is also beneficial for the convergence of the early phase of 
SGD, since the output of the CNN resulting from inputs c and initial weights of the 


order of 1 will also be of the order of 1. 
0.8 
, 
0.6 


0.0 


Fig. 4 Slices of progress variable field at = 0 ms (left) and t = 1 ms (right) into DNS3. Top: 
DNS fields, bottom: filtered fields downsampled on the coarse mesh. The transient inlet velocity 
step leads to the separation of a pocket of unburnt gases 
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4.2 Building and Analyzing the U-Net 


The U-Net architecture of Lapeyre et al. (2019) is detailed in Fig.5. It follows a 
fully convolutional, symmetrical, three-stage encoder—decoder structure. Each stage 
is composed of two successive combinations of 


e a3D convolution with a3 x 3 x 3 kernel, 
e a batch normalization layer (Ioffe and Szegedy 2015), 
e arectified linear unit (ReLU) nonlinear activation unit, 


followed by 2 x 2 x 2 pooling operations. In the encoder, maxpooling operations 
decrease the spatial dimensions of the feature maps by a factor of 2. The shape of 
the input field is then recovered by upsampling pooling operations in the decoder. 

The network contains a total of 1.4 million trainable parameters. In cases where 
a smaller network would be preferrable, the parameter count could be reduced by 
using simpler neural network architectures (Shin et al. 2021) or by investigating 
architecture search and pruning methods (Frankle and Carbin 2019). On an Nvidia 
Tesla V100 GPU, training the network to convergence in 150 epochs takes 20 min, 
and inference on a single snapshot of DNS3 only requires 12 ms. 

A key property of vision-based neural networks is their receptive field (RF), which 
corresponds to the input region that can influence the prediction on a single output 
point (Goodfellow et al. 2016). In practice, due to the distribution of the hidden 
layer connections inside the network, points located at the center of the receptive 
field contribute more to the prediction than those at the periphery. This leads to 
the notion of effective receptive field (ERF) (Luo et al. 2016) which measures the 
extent of the receptive field that is actually meaningful to the prediction, and can 
be quantified by counting the number of connections originating from each input 
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Fig. 5 Diagram of the U-Net architecture. Feature maps are represented by rectangles with their 
number of channels above. Arrows represent the hidden layers connecting the feature maps 
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Fig. 6 ERF superimposed on iso-lines of ¢ on a slice of a DNS3 snapshot (t = 0.8 ms). Grayscale 
intensity in the ERF is proportional to the impact of the input voxel location on the output prediction 
at the center of the ERF. Dashed circular line: edge of the ERF 


location. Figure6 compares the extent of the ERF of the U-Net with the DNS3 
flame. The size (Luo et al. 2016) of the ERF is approximately 7.6 times the filtered 
laminar flame thickness and is large enough to encompass all of the large-scale 
structures of the flame front. In comparison, the context size of the Charlette dynamic 
model can be estimated as the averaging filter size which is typically 2—6 times the 
filtered laminar flame thickness (Veynante and Moureau 2015; Volpiani et al. 2016). 
Increasing the context size of the dynamic model may lead to numerical issues caused 
by flame/boundary and flame front interactions (Mouriaux et al. 2017) and greatly 
impacts the computational cost of the procedure (Volpiani et al. 2016), whereas for 
CNNs it can simply be achieved by using a deeper network. 


4.3 A Priori Validation 


After training the CNN on snapshots of DNS1 and DNS2, it is evaluated a priori 
on snapshots of DNS3 which are fully separate from the training dataset. The values 
of the trained weights of the CNN are frozen, and the model behaves like a large 
parametric function mapping c to D In Fig. 7a, the Charlette and CNN models are 
compared by plotting the downstream evolution of the total flame surface area that 
they predict on the DNS3 snapshot with the largest DNS total flame surface. For 
reference, target flame surface values from the DNS and values obtained without 
any SGS modeling are also shown. In this snapshot, the flame contains three distinct 
regions: a weakly turbulent flame base attached to the inlet (x ~ 0-15 mm), followed 
by a detached pocket of unburnt gases (x ~ 15—45 mm) and a postflame region of 
combustion products with no flame front. 
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(a) Evolution of the total flame surface area along the streamwise x direction on a DNS3 
snapshot (t = 0.8 ms). The flame surface values are computed by integrating the total 


FSD on cross-section slices of the width of a coarse cell. 
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(b) Time evolution of the error on the domain-integrated total flame surface area relative 


to the target values on DNS3. 


Fig. 7 A priori evaluation of a selection of wrinkling models 


The static Charlette model with constant 6 = 0.5 finds the correct trend but consis- 
tently fails to accurately match the DNS flame surface values. The dynamic Charlette 
model with local 6 (A = 1.5A, Ayn = 2A) using the corrections from Wang et al. 
(2011) and Mouriaux et al. (2017) performs very well in the detached pocket and 
close to the inlet, but still struggles near the tip of the attached flame which features 
prominent flame front interactions. Finally, the CNN agrees nearly perfectly with 
the target values in all regions of the domain. Figure 7b shows that this behavior is 
consistent throughout the whole duration of DNS3, whereas the error made by the 
Charlette dynamic model fluctuates in time. 
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5 Discussion 


Deep CNNs trained to model SGS wrinkling show excellent modeling accuracy and 
consistency when compared to existing algebraic models on evaluation configura- 
tions that are similar to their training database. To move towards applications to 
practical complex configurations, some key questions still need to be addressed: 


1. What information should be provided to the model? The U-Net presented above 
only used the field of c as input, but algebraic wrinkling models usually incor- 
porate additional parameters like w’ /sz, 1;/5), Ka, ... 

2. To what extent can the model generalize to unseen configurations? Currently, the 
training dataset is built from DNS data which is rarely available in practice. If the 
model cannot reliably generalize well enough beyond its training distribution, 
this would severely limit its range of application. 

3. Can the model be coupled to a fluid solver for on-the-fly predictions? 


These questions apply broadly to any neural network model trained to predict 
an LES SGS quantity, not only to wrinkling models. Question 1 comes down to 
isolating the essential physical and numerical quantities that drive SGS wrinkling. A 
first meaningful quantity is the spatial distribution of c which identifies the location 
and thickness of the flame front in a premixed flame. Deep CNNs like the U-Net are 
presumably able to extract all the contextual information they need from the entire 
field of c, and indeed experiments have indicated that providing gradients of c as 
additional inputs does not improve their accuracy. Other works that opt to use simpler 
architectures with fewer trainable parameters do include gradient information in the 
input of the network. Shin et al. (2021) train a shallow MLP combined with a mixture 
density network that captures the stochastic distribution of ©. Since the MLP only 
processes local data, |Vé| and |V¢| fields are used as additional inputs to provide 
some spatial context. Ren et al. (2021) use a network composed of a shallow 2D 
convolutional base followed by five fully connected layers. Local predictions are 
computed from 3 x 3 box stencils of the filtered fields of c, |Vc| and the subgrid 
turbulence intensity u’, discretized on the fine DNS grid. 

Another relevant parameter is u’, /s, which controls the amount of total flame sur- 
face wrinkling and is a crucial quantity in many wrinkling models covered in Sect. 2. 
Nonetheless, the challenges inherent to modeling wu’, from LES quantities (Colin 
et al. 2000; Veynante and Moureau 2015; Langella et al. 2017, 2018) have made 
the saturated Charlette dynamic model (Eq.9) an attractive solution that does not 
directly depend on u’. 

Finally, the proportion of unresolved flame wrinkling in the total flame surface 
is determined by the filter size A. Since CNNs work on grid data with no explicit 
distance embedding, A /ôz sets the resolution of the filtered flame structures that are 
processed by the network. Figure 8 illustrates the ambiguity that may arise if A is not 
known by the network. There is an infinite number of combinations (c, A) that can 
lead to a given c field, each corresponding to a different amount of SGS wrinkling, and 
the sole knowledge of c is not sufficient to discriminate between them. Additionally, 
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A= 0.854 


(Ea’) = 1.20 


Fig. 8 Illustration of the filtering ambiguity. A filtered flame front (bottom) outlined by iso-lines 
of c can correspond to several unfiltered flames (top), each with a different filter size and mean 
wrinkling factor 


CNNs are known to be sensitive to resolution discrepancies between the training 
and evaluation datasets (Touvron et al. 2019). This issue was avoided in Lapeyre 
et al. (2019) by training and evaluating the U-Net at the same A /ôz but should be 
considered when generalizing to arbitrary flame resolutions. 

To move towards generalizable SGS neural network models, u’,/s_ and A /ôz 
should henceforth be accounted for in the model either implicitly, in the choice of 
the training and evaluation datasets, or explicitly, by incorporating them in the model 
inputs or feature maps. Xing et al. (2021) started to investigate this by evaluating 
a U-Net trained on a statistically planar turbulent flame to predict the SGS vari- 
ance of the progress variable c’. A jet flame evaluation configuration (Luca et al. 
2019) was chosen to test the ability of the network to generalize to a case featuring 
major differences from the training dataset regarding the large-scale flow and flame 
structures, thermophysical, and chemical parameters. The U-Net was observed to 
generalize better than existing dynamic approaches when uw’, /s, and A/ôz were 
chosen to match between the training and generalization configuration. Its perfor- 
mance dropped when either of these parameters did not match the unique values 
of the training set. However, when trained on a dataset containing a range of filter 
sizes, the U-Net was able to discriminate between the various A /ôz values without 
explicitly providing A /ôz as an input parameter. Apart from u’,/s, and A/6;, the 
inclusion of other relevant physical quantities can be investigated through feature 
importance analysis (Yellapantula et al. 2020). 

The limits to generalization of SGS neural network models are still not well 
understood. Generalization is usually assessed by evaluating the model on the training 
distribution sampled at different spatial (Henry de Frahan et al. 2019; Wan et al. 
2020) or temporal (Bode et al. 2021; Cellier et al. 2021; Chen et al. 2021) locations, 
or through minor parametric variations (Nikolaou et al. 2019; Lapeyre et al. 2019; 
Yao et al. 2020; Yellapantula et al. 2020; Chen et al. 2021). For wrinkling models 
specifically, Ren et al. (2021) study highly turbulent statistically stationary planar 
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flames at Ka = 38 (case L), 390 (case M), and 1710 (case H). Cases M and H are 
located in the broken reaction zone regime, where the flamelet assumption may 
not hold. Snapshots show a highly fragmented reaction front and the authors point 
out that the resolved and total FSD fields have large discrepancies for these cases. 
After training on case H, the model performs well on case M and at larger filter 
sizes, beating a selection of static wrinkling models. It is interesting to note that it 
performs relatively poorly on case L which belongs to the thin reaction zone regime 
and features an intact reaction zone. This result highlights the model’s sensitivity 
to changes in the turbulent combustion regime. Attili et al. (2021) draw similar 
conclusions after training the U-Net from Lapeyre et al. (2019) on four DNS of 
jet flames with increasing Reynolds numbers (Luca et al. 2019). Their results show 
that generalization to unseen turbulent levels works better between high Reynolds 
number flames, which they suggest is due to the asymptotic behavior of high Reynolds 
turbulence. In addition, models trained on a specific region of the flame (flame base, 
fully turbulent region, or flame tip) perform noticeably worse when tested on a 
different region, thus highlighting the spatial variations of the wrinkling distribution 
in a given flame. 

Supervised training of neural networks is a form of inductive learning, for which 
generalization depends on the inductive biases of the model (Griffiths et al. 2010). 
These are the factors outside of the observed data that intrinsically steer the model 
towards learning a specific representation. Generalization is largely driven by how 
well the model’s inductive biases fit the properties of the data representation it is 
trained to learn. The inductive biases of neural networks are heavily influenced by 
their architecture. MLPs have weak inductive biases, whereas CNNs have strong 
locality and translation equivariance inductive biases (Battaglia et al. 2018) which 
explains their success in generalization of computer vision tasks (Zhang et al. 2020). 
Since locality and translation equi-variance are also desirable properties of an SGS 
model, CNNs seem better suited than MLPs to generalize on SGS modeling tasks. 

On the other hand, coupling CNNs with a fluid solver for on-the-fly predictions 
and a posteriori validation comes with numerous implementation challenges. In the 
case of the U-Net, its field-to-field nature allows it to output predictions in the entire 
domain in a single inference of the network, which is a strong asset for computa- 
tions on large meshes. However, the input field needs to be built by gathering LES 
data points from the whole domain, and the prediction of the model has to be scat- 
tered back. For massively parallel solvers which perform domain decomposition, 
this requires dedicated message-passing communications between the solver and 
the CNN instances. Additionally, since the CNN can only process structured data, 
if the LES is performed on an unstructured mesh, the input and prediction fields 
must be interpolated between the solver mesh and a structured mesh that can be read 
by the CNN. Coupling interfaces such as OpenPALM (Duchaine et al. 2015) have 
successfully been used to manage these operations and perform fully coupled sim- 
ulations using the AVBP solver (Lapeyre et al. 2018). The computational overhead 
due to the coupling and the neural network prediction is less than 5%. As a refer- 
ence, the filtering operations used in the Charlette dynamic model typically induce 
overheads of 20-30% (Volpiani et al. 2016; Puggelli et al. 2021). Finally, given the 
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large number of parameters of the U-Net, inference is preferably performed on a 
GPU. This requires additional care in the coupling implementation, but should not 
limit the deployment of the model given the growing adoption of hybrid CPU-GPU 
supercomputer infrastructures. 


6 Conclusion 


The intersection of LES subgrid-scale modeling and machine learning is a promising 
and rapidly growing field in numerical combustion. The large modeling capacity of 
deep neural networks is a strong asset to model complex SGS flame-turbulence phe- 
nomena in a data-rich environment fueled by high-fidelity simulation results. Taking 
inspiration from the computer vision community, a deep CNN U-Net architecture 
is trained to predict the total—resolved and unresolved—flame surface density field 
from the LES resolved progress variable field. The U-Net is built to aggregate multi- 
scale spatial information on the flame front, ranging from the coarse mesh resolution 
to large flame structures, thanks to its wide receptive field. In this sense, it can be 
viewed as an extension of existing dynamic models that combine information at 
the filtered and test-filtered scales. DNS snapshots are filtered and downsampled to 
generate the training and evaluation datasets that are used to evaluate the U-Net in 
an a priori context. On the evaluation set of a slot burner configuration, the U-Net 
consistently matches the target flame surface density distribution, beating the static 
and dynamic versions of the Charlette wrinkling model. More generally, the model- 
ing methodology outlined in this chapter can be applied to any SGS quantity, such 
as the SGS variance of the progress variable. These results open the way to many 
compelling directions for future work. Coupling a deep CNN with a massively par- 
allel fluid solver is a key step towards a posteriori validation. Graph neural networks 
could be explored as alternatives that could handle on arbitrary meshes and complex 
geometries. Finally, an issue at the core of the practical deployment of any machine 
learning combustion model is to assess whether it can robustly generalize outside of 
its training distribution, a feature that will need to be demonstrated if these models 
are to replace traditional models in CFD solvers. 
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Abstract This chapter describes the use of machine learning (ML) algorithms with 
the linear-eddy mixing (LEM) based tabulation for modeling of subgrid turbulence- 
chemistry interaction. The focus will be on the use of artificial neural network (ANN), 
particularly, supervised deep learning (DL) techniques within the finite-rate kinetics 
framework. We discuss the accuracy and efficiency aspects of two different strategies, 
where LEM based tabulation is used in both of them. While in the first approach, 
referred to as LANN-LES, the subgrid reaction-rate term is obtained efficiently using 
ANN in the conventional LEMLES framework, in the other approach referred to as 
TANN-LES, the filtered reaction rate terms are obtained using ANN. First, we assess 
the implications of the employed network architecture, and the associated hyperpa- 
rameters, such as the amount of training and test data, epoch, optimizer, learning 
rate, sample size, etc. Afterward, the effectiveness of the two strategies is exam- 
ined by comparing with conventional LES and LEMLES approaches by considering 
canonical premixed and non-premixed configurations. Finally, we describe the key 
challenges and future outlook of the use of ML based subgrid modelling within the 
finite-rate kinetics framework. 


1 Introduction 


Combustion within energy conversion and propulsion devices such as internal com- 
bustion engines, gas turbines, rocket engines, etc., usually occurs under turbulent 
conditions. The turbulence-chemistry interaction in such devices is characterized by 
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highly nonlinear, unsteady, multi-scale, and multi-physics processes, which makes its 
investigation a challenging task. Although advancements in experimental diagnostics 
and computational tools have enabled some detailed studies, there are still challenges 
that need to be addressed. For example, while experiments under extreme operat- 
ing conditions are often limited to measurements of fewer quantities, computational 
studies using high-fidelity approaches such as direct numerical simulation (DNS) 
and large-eddy simulation (LES) usually tend to be computationally expensive, and 
limited to a few simpler problems. Specifically, DNS, where all relevant spatial and 
temporal scales are resolved, is used to carry out fundamental studies, but it requires 
simplifications in the geometry, flow conditions, or chemistry to address the compu- 
tational cost concerns. On the other hand, although LES, where only large-scales are 
captured and the effects of small-scales are parameterized using the subgrid-scale 
(SGS) closure models, is considered a promising strategy (Fureby and Möller 1995; 
Gonzalez-Juez et al. 2017; Pitsch 2006), to obtain statistical convergence, its compu- 
tational cost is also not trivial. While subgrid-scale (SGS) closure for reacting LES 
remains an ongoing research effort for many approaches, the computational cost is 
a key challenge when employing finite-rate chemistry (FRC) approach with detailed 
chemical mechanisms. Here, we discuss past strategies to develop machine learning 
(ML) tools for LES of reacting flows, with a particular focus on finite-rate kinetics. 

In recent years, rapid advancements in computing resources and data storage 
capabilities have led to increased usage of supervised deep learning (DL) using arti- 
ficial neural network (ANN) (Goodfellow et al. 2016; LeCun et al. 2015) to tackle 
challenging problems from several fields such as computer vision (Krizhevsky et al. 
2012), speech, image and text recognition (Bishop 2006), natural language process- 
ing (Collobert and Weston 2008), health-care (Leung et al. 2014), genetic sequenc- 
ing (Libbrecht and Noble 2015), materials discovery (Pilania et al. 2013), complex 
game playing (Silver et al. 2017), high-energy physics (Baldi et al. 2014), etc. This 
is primarily due to the ability of the DL to effectively deal with high-dimensional 
data and the modeling of complex and nonlinear relationships. DL techniques are 
essentially representational learning methods that employ multiple levels of repre- 
sentation. These techniques transform the representation at one level starting with 
the raw input to an abstract representation at a higher level, which allows learn- 
ing complex nonlinear relationships. The layers of features are learned from huge 
datasets using general-purpose learning procedures. Such a representational learning 
approach enables the discovery of intricate structures in high-dimensional data and 
is therefore amenable to different domains of science and engineering. Furthermore, 
the recent advancements in the back-propagation algorithm, mini-batch stochastic 
gradient, novel architectures such as convolutional neural network (CNN), and recur- 
rent neural network (RNN) have also accelerated a wider adoption of DL techniques 
in different domains of science and engineering (LeCun et al. 2015). 

To apply this approach to LES of reacting flows, the data-driven modeling through 
DL must focus on performance improvements via generalizing a model that captures 
all variations within the data. A conventional deep neural network (DNN) for model- 
ing of the reaction-rate term is shown in Fig. 1, which is a multilayer fully connected 
feed-forward network where the information flows in a forward direction from input 
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Fig. 1 Schematic of a multi-layer perceptron (MLP) for modeling of the reaction-rate term with 
two hidden layers having the vector x = (Y1, Y2, ..., Yg, T) as an input and the vector y = 
(@1, @2, ... , ©) as an output 


to output. Here, the input comprise of species mass fraction (Y; with i = 1, 2,...,k) 
and temperature (T), and the output comprise of the corresponding reaction-rate 
term (@;). Here, k denotes the total number of chemical species. Mathematically, 
a DNN defines the mapping A: x —> y, where x and y denote input and output 
variables, respectively, and A represents a composition of many different functions, 
which can be represented through a network structure. A typical DNN comprises 
an input layer, an output layer, and more than one hidden layer. Each layer consists 
of several nodes, which are connected to all the nodes in the previous and the fol- 
lowing layers. The complexity of a DNN increases with an increase in the number 
of hidden layers and the number of nodes per hidden layer. Such a basic network 
is also referred to as a multilayer perceptron (MLP). It has been shown that MLPs 
can yield universal function approximations (Hornik et al. 1989). Therefore, with 
enough layers and nodes, MLPs can be used to model arbitrarily complex and highly 
nonlinear functional forms, such as those needed for closure of the SGS terms while 
performing LES. 

ANN algorithms have been used for SGS closure models in the context of 
Reynolds-averaged Navier-Stokes (RANS) and LES in past studies of both non- 
reacting (Beck et al. 2019; Duraisamy et al. 2015, 2019; Ling et al. 2016; Maulik 
and San 2017; Vollant et al. 2017) and reacting (Christo et al. 1995, 1996; Lapeyre 
et al. 2019; Seltz et al. 2019; Sen et al. 2010; Yellapantula et al. 2020) flows. In 
the context of LES of turbulent combustion, there are two key areas of relevance 
(a) the need to use detailed chemical kinetics for an accurate representation of the 
thermochemical state space, and (b) the modeling of the filtered reaction-rate term 
to account for the SGS turbulence-chemistry interaction. Over several years, past 
studies have focused on tackling both of these challenges, and further research is 
still underway. 
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To address the challenge related to thermochemical representation, detailed chem- 
ical kinetics can be used for accurate predictions over a wide range of operating 
conditions, In contrast, while the use of simplified chemical mechanisms is compu- 
tationally expedient, they are known to affect the quality of predictions (Bilger et al. 
2005). For several reacting flow conditions, the use of flamelet (Peters 2000; Pitsch 
2006) and other low-dimensional manifold based approaches (Maas and Pope 1992; 
Bradley et al. 1988; Van Oijen and De Goey 2000) have been popular for their com- 
putational tractability. ANN has also been used to store flamelet libraries to reduce 
the computational storage requirements (Kempf et al. 2005; Ihme et al. 2009; Zhang 
et al. 2020). Additionally, it has also been used to model SGS source and transport 
terms (Seltz et al. 2019). Although low-dimensional manifold formulation can be 
used for some problems, often detailed finite-rate chemical mechanism is needed 
to accurately capture the flame dynamics and other features such as extinction, re- 
ignition, lean blowout, pollutant emissions, etc. However, FRC-based LES, referred 
here onwards as FRC-LES, becomes computationally intractable for simulation of 
practical applications when using a detailed chemical mechanism. The higher com- 
putational cost of FRC-LES is associated with the need to solve a highly stiff ODE 
system resulting from a wide range of time scales associated with different chemical 
species in a complex chemical mechanism, and the need to transport a large number 
of chemical species. In addition to approaches for computational cost reduction such 
as hybrid transported-tabulated chemistry (HTTC) (Ribert et al. 2014) and dynamic 
adaptive chemistry (DAC) (Yang et al. 2017) to name a few, ANN algorithms have 
also been used to address the computational cost concerns of FRC-LES (Christo et al. 
1995; Christo et al. 1996; Sen et al. 2010; Sen and Menon 2010; Zhou et al. 2013; 
Franke et al. 2017; Sinaei and Tabejamaat 2017; Ranade et al. 2021). 

A major challenge for LES of turbulent combustion is the need for accurate 
modeling of the filtered reaction-rate term. It has led to numerous physics-based 
SGS closure models for both low-dimensional manifold and FRC-based approaches. 
The reader is referred to the review articles (Pitsch 2006; Fureby 2009; Gonzalez- 
Juez et al. 2017), where challenges of different modeling paradigms and strengths 
and limitations of several modeling approaches are discussed. The modeling of the 
SGS turbulence-chemistry interaction is key for the accurate prediction of the flame 
dynamics. ANN-based strategies have been employed for computational cost reduc- 
tion of filtered reaction-rate term modeling within both low-dimensional manifold 
(Nikolaou et al. 2019; Lapeyre et al. 2019; Seltz et al. 2019; Yellapantula et al. 2020) 
and FRC (Sen and Menon 2010; Zhou et al. 2013; Franke et al. 2017; Sen et al. 2010) 
formulations. 

Although ANN algorithms have shown some success in LES of turbulent combus- 
tion, further studies are needed to examine the predictive capabilities and robustness 
of such algorithms. The focus of this chapter is to discuss the application of ANN 
while employing one specific subgrid model using the linear eddy mixing (LEM) 
model in LES (referred to as LEMLES) (Menon et al. 1993; Menon and Kerstein 
2011). LEMLES is a two-scale strategy, where the species transport equations are 
solved using a two-step procedure. In the first step, the species transport equations 
and FRC mechanism are solved at the subgrid level using the 1D LEM model (Ker- 
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stein 1989), where the LEM model acts as an embedded SGS model for the species 
equation as viewed on the LES space- and time-scales. The second step simulates the 
evolution of the computed subgrid scalar fields at the resolved LES level. LEMLES 
has been extensively used in past studies for investigation of a wide range of applica- 
tions, such as gas turbine combustor (Kim et al. 1999), rocket combustor (Srinivasan 
et al. 2015), spray combustion (Sankaran and Menon 2002), scramjet (Menon and 
Jou 1991), etc. Although LEMLES allows for the handling of arbitrarily complex 
chemical mechanisms, its use has so far been limited to moderately complex chem- 
ical mechanisms due to the cost associated with the computation of stiff kinetics. 
ANN algorithm within the framework of LEMLES allows addressing this issue (Sen 
et al. 2010; Sen and Menon 2010), which is the main focus of this chapter. 

The chapter is organized as follows. An overview of ML strategies for modeling 
turbulent combustion reported in the literature is presented in Sect. 2. The formulation 
and application of ANN within LEMLES are discussed in Sects. 3 and 4. Section 5 
discusses the limitations of the past studies that employed ANN within LEMLES. 
Section 6 concludes with a discussion of the future of ML for subgrid modeling of 
turbulent combustion using LEM and their implications. 


2 ML for Modeling of Turbulent Combustion 


As stated in Sect. 1, ML algorithms have been used to reduce the computational cost 
of finite rate chemistry while using different chemistry modeling paradigms (low- 
dimensional manifold or FRC). So, first, a brief overview of ANN-based modeling 
strategy for chemistry and the constituents of ANN models are discussed. Afterward, 
a summary of studies focused on the use of ANN in LES of turbulent combustion is 
presented. 


2.1 ANN Model for Chemistry 


While using the FRC approach, the reaction rate terms are obtained by solving a 
system of first-order ordinary differential equations (ODEs) expressed as: 


dYk ; 

g eT P) =a, k=1,2,... Ns, (1) 
where Y, and œp denote the mass fraction and the reaction rate for the kth species. 
Here, œ can be obtained for a prescribed chemical mechanism and associated kinetic 
parameters, along with temperature T and pressure P. The system of ODEs given 
by Eq. 1 is in general stiff, particularly for detailed chemical mechanisms, due to a 
wide range of timescales associated with different chemical species. Therefore, to 
solve Eq. 1, stiff ODE solvers such as the fully implicit double-precision variable- 
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coefficient ODE solver (DVODE) (Brown et al. 1989) are needed, which tend to be 
expensive. ANN can be used to approximate the ODEs with nonlinear regression, 
thus addressing the issue of computational cost. 

ANN regression can be obtained through a MLP (Bishop 1995; Haykin and Net- 
work 2004), which involves a sum of nonlinear basis functions, also referred to as 
activation functions, and coefficients, which include biases and weights. A typical 
MLP with inputs (Y, T) and outputs (@,) is shown in Fig. 1. ANN extracts the 
complex relations embedded within a given input/output training dataset through a 
learning procedure, and the extracted complex relations can later be used to predict 
the states on which the training was not performed. The learning process essentially 
adjusts the biases and weights for each layer of the MLP to obtain a minimal error 
at the output layer by using a back-propagation algorithm. These optimal weights 
and biases, along with the specific MLP configuration, form the ANN model. The 
resulting ANN model can then be used for an efficient representation of the complex 
dynamics of chemistry described by Eq. 1. 

A typical ANN model includes parameters, hyperparameters, and training strate- 
gies. The parameters, such as the model coefficients are updated by the ANN model 
during the learning process, and they only require initialization. The hyperparameters 
such as the components of the network architecture are specified for a particular prob- 
lem, which varies from one problem to the next. These include the number of hidden 
layers and neurons, learning rate, momentum during the back-propagation algorithm, 
activation function, epochs, mini-batch size, and dropout. A brief overview of the 
hyperparameters and training strategies is discussed next. 

The two key hyperparameters are the number of hidden layers and the number 
of neurons, which are needed for an accurate representation of complex nonlinear 
input/output relationships. Although increasing them, in general, improves the accu- 
racy, it also makes the network heavy and eventually the accuracy tends to stagnate. 
The activation function is through which weighted sums are passed to obtain a non- 
linear output. The specification of the activation function determines the efficiency 
and accuracy of the ANN model. Some of the commonly used activation functions 
include hyperbolic tangent (tanh), rectified linear unit (ReLU), sigmoid, etc. 

When dealing with big data, it is also inefficient to use the entire data for training. 
Therefore, batches of small-size data are typically used for efficient training, although 
care needs to be taken to avoid overfitting, which would face difficulties in fitting 
to any new data. The epochs denote the number of times the algorithm trains on the 
entire data, and its value is also closely associated with the accuracy of the model. 

The strategies that are commonly specified while obtaining the ANN model 
include initialization of the parameters, data normalization, optimization algorithm, 
and regularization. The initialization of the parameters can be performed based on 
the chosen activation functions and it affects the efficiency of the ANN model. In 
several applications, the input data has different scales, which can affect the rate of 
convergence during the training of the ANN model. For example, in combustion, 
inputs comprise of temperature and mass fraction of species, which differs by sev- 
eral orders of magnitude, therefore, normalization becomes imperative for improved 
performance. The optimizers are algorithms used during the training to reduce the 
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loss function, which in turn is used to update the weights. It can directly affect the 
convergence of the model during the training stage. Some commonly used optimizers 
include Adam optimizer, gradient descent, stochastic gradient descent, etc. The loss 
function needs to be defined during the training to compute the model error. The 
regularization strategy is useful to avoid the overfitting of the ANN model. 

It is apparent that a robust ANN model requires a careful selection of parame- 
ters, hyperparameters, and training strategies. This becomes even more challenging 
for turbulent combustion, which is marked by multi-scale and highly nonlinear pro- 
cesses with multiple regimes and modes of combustion where complex relationships 
between variables representing the thermochemical space exist. Therefore, usually, a 
significant amount of tuning is needed to realize a robust ANN model for a particular 
turbulent combustion application. 


2.2 LES of Turbulent Combustion Using ANN 


An overview of past studies focused on the use of ANN while performing LES 
of turbulent combustion is summarized in Table |. The table includes some well- 
established turbulent combustion models that are used with either a low-dimensional 
manifold or a finite-rate representation for chemistry. The FRC models include LEM- 
LES and transported probability density function (TPDF) approaches and the low- 
dimensional manifold approaches include flamelet and flame surface density (FSD) 
approaches. It can be observed that the ANN-based strategy has been used to study 
canonical as well as realistic flow configurations. In addition, both premixed and 
non-premixed modes of combustion have been examined. This illustrates a wide 
range of applicability of the use of ANN for LES of turbulent combustion. 

Some key details of the ANN models employed by the past studies are also 
summarized in Table | to identify if there are any commonly used constituents of the 
ANN model. These constituents are labeled as ‘T’, ‘O’, ‘f’, and ‘L’ corresponding 
to the type of training datasets, the optimization algorithm, the activation function, 
and the loss function, respectively. As discussed in Sect. 2.1, these are some of the 
key parameters describing the ANN model. 

In general, the training of the ANN model has been performed using different 
types of datasets such as one-dimensional (1D) laminar flamelet, 1D LEM, and DNS 
datasets. There are advantages and limitations of the usage of these types of datasets. 
For example, training solely based on a 1D laminar flamelet can not account for the 
effects of turbulence-chemistry interactions. While this is partly addressed in training 
based on the 1D LEM dataset, some key features of turbulent combustion such as 
large-scale curvature effects are not accounted for. The DNS datasets account for all 
possible states for a particular test configuration and appear to be better compared to 
the other two approaches. However, it has limited predictive capabilities for condi- 
tions that were not present in the DNS dataset and is computationally prohibitive. 

The activation function for a neuron in the ANN model defines the output of that 
neuron for a given input set. Similar to other fields where ANN has been used, all 
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Table 1 Summary of contributions to application of ML in modeling of turbulent combustion. The 
ANN model components are labeled as, T: Training data, O: Optimization Algorithm, f: Activation 
function, L: Loss function 


Method Configuration ANN model Key results 
LEMLES Non-premixed syngas |T: 1D LEM, O: SGD | Captured 
flame (Sen and Menon | f: tanh, L: MSE extinction/re-ignition; 
2010; Sen et al. 2010) 5.5 times faster than 
DI; testing of stiff 
kinetics needed 
Non-premixed T: ID LEM, O: WH | 4.9 times faster than 
methane flame (Sinaei | f: tanh, L: MSE DI; Compared 
and Tabejamaat 2017) ANN-LES, DI-LES, 
and LUT-LES 
TPDF DLR-A methane flame | T: Flamelets, O: SGD | MLP-SOP based 
(Ranade et al. 2021) f: tanh, L: MSE ANN; 3 times faster 
and 10° times reduced 
memory requirements 
compared to MLP 
based ANN 
Flamelet Premixed methane T: DNS, O: SGD CNN for subgrid 
flame (Seltz et al. f: ReLU, L: CE source and transport 
2019) terms 
Non-premixed T: Flamelets, O: - Steady flamelet 
methane flame (Kempf | f: —, L: — modeled with ANN 
et al. 2005) 10° times lower 
memory 
Non-premixed Sydney | T: Flamelets,O: LM | Network performance 
flame (Ihme et al. f: sigmoid, L:RMS | with respect to 
2009) accuracy, data retrieval 
time, and storage 
requirements 
FSD Premixed methane T: DNS, O: SGD CNN based subgrid 
flame (Lapeyre etal. | f: ReLU, L: MSE flame wrinkling factor 
2019) Robust for realistic 
configurations 
Premixed methane T: DNS, O: SGD CNN based subgrid 
flame (Ren et al. 2021)| f: ReLU, L: MSE flame wrinkling factor; 
robust for realistic 
configurations 


three widely popular activation functions, namely, tanh, ReLU, and sigmoid functions 
(Karlik and Olgac 2011; Nwankpa et al. 2018) have been used while performing LES 
of turbulent combustion. For the optimizer, the stochastic gradient descent (SGD) 
algorithm has been typically used. However, some studies have also used Widrow- 
Hoff (WH) and Levenberg—Marquardt (LM) algorithms. Finally, mean-squared error 
(MSE) has been used commonly for the loss function in these studies. 
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Most of the studies summarized in Table | demonstrate an improved performance 
in terms of speedup of chemistry computation as compared to a conventional direct 
integration (DI) approach for handling stiff kinetics (other studies may exist and 
hence, this list is not considered comprehensive). In addition, these studies have also 
demonstrated the benefits of the use of ANN in terms of reduced computational 
storage requirements. Some recent studies relying on the use of CNN (Lapeyre et al. 
2019; Ren et al. 2021) have shown the robustness of the approach for accurately simu- 
lating realistic flow configurations where the performance of the CNN based subgrid 
model was shown to be better compared to reference algebraic closures. Overall, the 
past and recent studies clearly demonstrate the potential of ANN-based modeling 
of turbulent combustion. However, further studies are also needed to identify the 
best practices in specifying the hyperparameters and the strategies for attaining a 
successful and accurate ANN model. 


3 Mathematical Formulation with ANN 


In this section, the mathematical formulation of LEMLES with the use of ANN for 
the modeling of chemistry is discussed. First, the governing equations for FRC-LES 
and the subgrid modeling of the scalar fields using LEM are described. Afterward, 
two approaches using ANN, either to model the resolved reaction rates at the subgrid 
level or to directly model the filtered reaction rates including the subgrid effects are 
discussed. 


3.1 Governing Equations and Subgrid Models 


3.1.1 Large-Eddy Simulation 


The LES equations are obtained through Favre filtering of compressible multi-species 
Navier-Stokes equations, which lead to the following conservation equations for 
mass, momentum, energy, and species mass 


ap Opi; 
a =0, 2 
ot + OX; (2) 
pu; a See D = sgs 
at yy [Piri + Poy -Ty +r] = 0. (3) 
apE ð ed D~ ~ sgs sgs 
ec + [(PE + P) a; +9; — OTi; + H” +o,” | =0, (4) 
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apy, 9 
ot a OX; 


[o (Ft + Xe Vin) + Ye + OE] = Oy k=. Ns. 6) 


Here, f denotes a spatially filtered quantity corresponding to the variable f, and f 
is a Favre-filtered quantity, which is defined as: f = pf /p. In the above equations, 
p is the density, u; is the velocity vector, P represents the pressure, E is the total 
energy per unit mass, Y; is the mass fraction of the kth species, and N, is the total 
number of species. In addition, t;; is the viscous stress tensor, q; is the heat flux 
vector, and V; g, and œ% are species diffusion velocity vector and the reaction-rate 
for the kth species, respectively. The terms with superscript ‘sgs’ are unclosed terms 
resulting from the filtering operation, which require additional closure models. 

The total energy per unit mass in Eq. 4, E, is defined as the sum of the internal 
energy per unit mass (e) and the kinetic energy per unit mass. The corresponding 
Favre-filtered total energy per unit mass, i.e., E, is given as the sum of é, the resolved 
kinetic energy per unit mass (w#;;) /2, and the SGS kinetic energy per unit mass 
ks = (mi — wi) /2. 

The above system of conservation equations is closed by using an equation of 
state through: P = p (R T+ T85) , and the filtered enthalpy per unit mass, which is 


defined as: h = (SE fea + EE) + T58, Here, hy is the specific enthalpy of the 


kth species, R is the mixture gas constant and T*** is an unclosed term resulting from 
the filtering of the equation of state. 

The filtered viscous stress tensor, T;;, and the filtered heat-flux vector, g;, are 
given by 


a ~ Ix 
Tij = 2uSij — zP Susi; x 240 (3; -= 55nd) , (6) 


Of <n ak Of. l ue 
Gi = Ka HPR MY Viet Dai © ea tOO MVt D aie 
ži k=1 k=1 Xi k=1 k=1 
(7) 


where Ş; j is the resolved strain-rate tensor, and m and K are filtered viscosity and 
thermal diffusivity, respectively, which are approximated using the resolved quanti- 
ties. 

The SGS terms appearing in the above equations require further modeling. These 


terms are given as: Tr = p (uru; — üi), H” = p (Eu; = Pi) + (uP — ù; P), 


oO; = (uty — U;Tij), Yik = p (wil, = iY) o =p (Viet = PaE = 
5 (MYV = AA T = RT — RŪ, and E = Ype (T) — Fyer (Č), which 
result from the application of filtering operation to the non-linear terms. In the expres- 
sions for OT q; i and ES here, the repeated index k does not imply summation. 
Further details about these terms, their physical relevance and terms that are typically 
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neglected in LES studies are discussed elsewhere (Fureby and Möller 1995; Ranjan 
et al. 2016). 

In the context of pami flows, Yi. Oik di> TS, Eg and & require closure 
models. Typically, q;% , T, 64°, and E¿® are neglected in LES (Fureby and Möller 
1995), and therefore, these terms are neglected here as well. The modeling of SGS 
scalar flux YE and filtered reaction rate w,, is the key focus here, and they are 
discussed further in the following sections. 


3.1.2 Subgrid Modeling Using LEM 


The linear eddy mixing (LEM) model (Kerstein 1989) is a stochastic approach to 
model the effects of 3D turbulent mixing in a 1D domain. It was originally a stand- 
alone model to account for the interactions between turbulence, molecular diffusion, 
and reaction kinetics. In LES, the unsteady species and temperature evolution equa- 
tions are solved on a 1D subdomain embedded inside each of the LES cells, where 
the reaction and the diffusion processes are locally resolved, but the effects of 3D 
(assumed isotropic) turbulence are included via randomized stirring events. The gov- 
erning equations for 1D LEM are given by 


Pad ð ; 
= Fk stir ~ a Y, V; , 8 
Poy 5 Festir 5, (OYr Vsk) + àk (8) 


Ns Ns 
oT ð oT ð 
Cpmix—— = F. stir + = Lo h Yr Vs = hga j 9 
PCpmix oz T.stir + = (« r) PE (>: kPYk : 2 kök, (9) 


where ‘s’ represents the co-ordinate along the 1D LEM domain. The terms Fy stir 
and Fr str represent stirring events in the above equations. The turbulent stirring is 
implemented as stochastic events (based on the so-called triplet maps (Kerstein 1989) 
that attempts to mimic the effect of vortices on the scalar field. Successive folding 
and compressive motions are modeled during these events, with its time/length-scale 
governed by the nature of turbulence. This also allows for capturing a thickened 
reaction zone at high turbulence intensity, as the stirring time-scales get smaller, and 
small-sized eddies can disturb the reactive/diffusive flame structure. 

The 1D LEM domain is notionally aligned in the flame-normal direction as shown 
in Fig. 2a. The LEM has also been coupled with LES for subgrid closure of the terms 
discussed in the previous section, wherein, the 1D LEM domain is embedded within 
each LES cell, as shown in Fig. 2b. Two approaches, linear eddy mixing model with 
large eddy simulation (LEMLES) (Menon and Kerstein 2011), and reaction-rate 
closure for large eddy simulation (RRLES) (Ranjan et al. 2016; Panchal et al. 2019) 
have been used in the past, and they are briefly summarized below. 

The LEMLES approach models the species evolution equation, i.e., Eq. 5 with 
unclosed terms Y z and @, altogether. The species mass fractions are not evolved on 
the LES grid, but rather only on the 1D LEM domains embedded within the 3D LES 
computational cells. Since the flame is resolved on the 1D domain, the grid resolution 
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Fig. 2 Sketch of the 1D 
LEM embedded along the 
flame for its standalone 
application (a), or within 
the LES cells for 
LEMLES/RRLES (b) 


(a) Standalone LEM (b) LEMLES 


can be chosen to be fine enough to resolve the reaction and the diffusive terms, thus 
eliminating the need for any further closures. However, closures are needed for the 
subgrid turbulent mixing and the large-scale convection. While the subgrid mixing 
is modeled through turbulent mixing, the large-scale transport is modeled using a 
Lagrangian transport through the splicing algorithm (Menon and Kerstein 2011). 
With this approach chunks of 1D LEM domain (with Y and T) along the direction 
of convection across the LES cells are transported. 

LEMLES has been successfully used in the past for a wide variety of problems, 
including premixed (Sankaran and Menon 2005), non-premixed (Sen et al. 2010; 
Srinivasan et al. 2015) and spray (Sankaran and Menon 2002; Patel and Menon 
(2008)) flames over a range of conditions. However, there are certain disadvantages 
of the LEMLES approach. A key limitation is that the reduction to a 1D notional 
dimension limits its ability in cases where the flame has to propagate in 3D as 
opposed to fluctuate around a statistically mean direction. At high Re, the turbulent 
diffusion usually dominates the molecular diffusion, which is captured by the 1D 
LEM model, but errors are incurred at low Re, or towards the DNS limit, where 
molecular diffusion, which is neglected on the large-scale, dominates. 

Considering these drawbacks, the RRLES approach (Ranjan et al. 2016; Panchal 
et al. 2019) is a recent modification of the LEMLES approach, where only the fil- 
tered reaction-rate terms @, are modeled using a multi-scale LEM framework. Here, 
filtered species equations Eq. 5 are still solved using a 3D grid where a conventional 
gradient-diffusion closure is used for y e whereas, the filtered reaction rate term @, 
is modeled using LEM. At every time step of the evolution of the LES equations in 
3D, the filtered species mass fractions È, k) and the filtered temperature (T) evolving 
at the resolved level are used to reconstruct SGS variation on the 1D notional LEM 
domain inside each LES cell, and after solving for the subgrid reaction-diffusion 
equation and including the effect of turbulent mixing on the LEM domain, the fil- 
tered reaction rates are computed and projected back to the 3D LES grid. The RRLES 
approach has an advantage over the original LEMLES approach, particularly in a 
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well-resolved or a locally laminar condition, where it can asymptote to the DNS limit. 
However, this approach cannot account for counter-gradient transport of scalars, and 
sensitivity of results to the reconstruction procedure is another uncertainty (Ranjan 
et al. 2016). 


3.2 ANN Based Modeling 


As discussed in Sects. 1 and 2, ANNs can be considered as highly non-linear regres- 
sion models, and they are used here to model the reaction rate terms © and a, 
described in the previous section. 


3.2.1 Problem Definition: Resolved Reaction Rates 


The conventional FRC allows for the inclusion of arbitrarily complex chemical 
kinetic mechanisms, that can range from O(10) to O(100) species and reactions. 
The individual reaction rates are computed using Arrhenius rate expressions, and 
these computations can get expensive with an increasing number of species/reactions. 
Even with reduced chemical kinetics, a stiff direct integration (DI) solver such as 
DVODE may have to be used, which can result in a significant computational cost, 
ranging 60-90% of the total computational cost of a simulation (Sen et al. 2010). As 
discussed in Sects. 1 and 2, a solution to this could be to tabulate these source terms 
over a range of conditions, instead of DI of them at each simulation step. However, 
this table would become very large and highly multi-dimensional as it would have 
N, + 1 input (Yz, T) and N, (@,) output variables. Therefore, instead of tabulation, 
the ANN model denoted by A; for the kth species is employed for estimating the 
reaction rates as: 


k = Ax (V1, Y2,...,¥y,,T), for k=1,2,..., Ny. (10) 


Considering a range of time scales associated with different chemical species, 
separate multi-input, and single-output MLP are used for each species. Each neuron 
in the ANN model A; contains weights and biases, and their training is discussed in 
the next sections. The capabilities of the ANN model have been assessed using three 
chemical mechanisms in the past studies (Sen et al. 2010; Sen and Menon 2010; 
Sen and Menon 2009). These include, (A) 11-steps-14-species Syngas/air (Sen et al. 
2010) skeletal mechanism for premixed flames, (B) 12-steps-16-species methane/air 
skeletal mechanism (Sung et al. 1998) for premixed flames, and (C) 21-steps-11- 
species Syngas (Hawkes et al. 2007) mechanism for non-premixed flame. Note 
that independent ANN model and training datasets are required for each chemical 
kinetics. 
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3.2.2 Training Algorithm 


The training of ANN model comprise of two stages, which include, a forward prop- 
agation of the input, and a backward propagation the error. The output of a single 
neuron i at iteration number k is calculated as 


M 
yilk] = f (£ Wimlk]¥m[k] — bi m) : (11) 
m=0 


Here, Wim[k] is the weight coefficient between neurons i and m, y,,[k] is the output 
of the neuron m, b;[k] is the bias of the neuron i, and M is the number of neurons 
feeding into the neuron 7. As described in Sect. 2.1, there are several options for 
specifying the activation function f(-). All the results presented in this chapter use 
the hyperbolic-tangent (tanh) as the activation function. 

To perform tuning of the model weights and biases during the training of the 
ANN model, mean squared error (E) of the network are typically minimized using 
a gradient descent rule (GDR), i.e., 


ə E[k] 


Wimlk + 1] = Wi KI 1 Ta (12) 


where k is the GDR iteration step. Standard GDR may not be able to deal with 
error surfaces that have local minima where it could get trapped, and therefore, a 
momentum modification is used as 


ə E[k] ə E[k] 


Wim k+1]= Wim k ` 
[ | l l 1 SW, [k] OD Wimlk = 1] 


Here, 7 and @ are the model hyperparameters, global learning rate and momentum 
coefficient, respectively. Since, these model hyperparameters need to be calibrated 
for each new case for optimum convergence, otherwise, a modification similar to 
extended delta-bar-delta (EDBD) (Minai and Williams 1990) learning model has to 
be used. In the current approach, each neuron has their own model parameters (nim, 
Qim), and they are updated at every ANN iteration based on the history of the global 
error as: 


nimtk + 1] = nimlK] + Animlk], (13) 
cidnim, — if dimlk]Pimlk — 1] > 0 

Anim{k] = $ —Kidnims if PimlkIBimlk — 1] < 0 (14) 
0, if PimlK]Pin lk — 1] = 0. 


Here, à = (1 — exp(—kaim[k])), Gim[k] = 3 E[k]/3Wim[k], and $;,,[k] = (1 — 0) 
Pimlk — 1] + 0 Qim[k]. Furthermore, xı and x2 are second-order model-coefficients, 
which are specified to be 0.1 and 0.01, respectively, based on numerical experiments. 
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Some salient features of this training approach are as follows: 

e Each connection has its learning coefficients. 

e Changes to the model coefficients are performed based on the value of the local 
error gradients (¢;,[k] and @;,[k — 1]), where the updates are enhanced in the 
regions of huge error gradient, and reduced near a minimum. 

e Instead of training using the mini-batch approach, the updates are done after 
introducing the whole training set to establish a correlation between ¢;,,[k] and 
Pim [k = 1]. 

e Incase the weights start to increase without bounds, the coefficients are reverted 
to a previously saved state. 


Further details about this approach can be found elsewhere (Sen and Menon 2010, 
Sen and Menon 2009), however, application of more advanced approaches developed 
in the ML community, e.g. Adam optimizer algorithm (Kingma and Ba 2014), needs 
to be evaluated in the future to the problems considered here. 


3.2.3 Training Dataset 


For the ANN model to be able to accurately model the reaction rates w,, the training 
set has to cover a range of conditions, i.e., Y, and T that would be encountered 
during the 3D simulations. Since the training set has to be generated using DI, the 
cost of its generation is another concern. For example, even though a DNS of the 3D 
application problem can generate all the states accessed during the simulation, it is 
not computationally feasible to do so for training, thus requiring alternate approaches. 
The results presented here consider the following three methods for obtaining the 
training dataset: 


e FANN: The training set is generated using the tables extracted from a 2D flame- 
vortex interaction (FVI) simulation (Poinsot et al. 1991; Sen and Menon 2009). 
A premixed flame is initialized corresponding to the inflow equivalence ratio and 
temperature, and a coherent vortex diameter( Dc) is chosen to be of the same as the 
integral length scale Lr of the 3D application. Since turbulence is a superposition 
of multiple vortices, the maximum velocity induced by the vortex Uc ax is varied 
in the range 10 < Uc.max/Sz_ < 400, where Sz is the laminar premixed flame 
speed. Six cases have been considered within this range, and training samples are 
obtained from multiple snapshots. 

e PANN: The training set for PANN is generated using tables obtained from 1D 
laminar premixed flame simulations (Sen et al. 2010). The inflow equivalence ratio 
and temperature are specified based on premixed flame operating conditions. A 
limitation of this approach is that no information about the turbulence is embedded 
within the training dataset. 

e LANN: Here, the training set is generated using standalone 1D LEM simulations 
(Sen et al. 2010). As LEM is supposed to emulate the effects of turbulence on a 
flame, therefore, the resulting training dataset accounts for some effects of turbu- 
lence. This approach can be used for either premixed or non-premixed flames. The 
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laminar flame is initialized on the 1D LEM domain, and the reaction, diffusion, 
stirring equations are solved as described earlier. For premixed cases, the initial 
profile is a function of the equivalence ratio (ER) and inflow temperature, and for 
the non-premixed cases, it is also a function of the strain rate. In this approach, 
turbulent Reynolds number Re, can be varied, which for the cases considered here 
has been varied from 10 to 180 (with 20 values in between) for LEM, and the 
integral length scale L corresponds to the specific 3D application. 


The above strategies are computationally cheaper compared to the dataset gen- 
eration using 3D simulations. The three approaches have different levels of fidelity 
in terms of embedding the effects of subgrid turbulence-chemistry interactions in 
the training datasets. For example, while PANN completely ignores the subgrid 
turbulence-chemistry interactions, LANN accounts for it albeit in form of stochastic 
stirring events. Alternate strategies need to be examined further to have an increased 
fidelity of the training dataset that can be generated in an efficient manner. These 
strategies will also need to incorporate the effects of other input variables such as 
pressure (and possibly heat loss) to enable applications to practical configurations. 


3.2.4 Structure of ANN 


Given the training dataset and the algorithm, the next step is to choose the ANN struc- 
ture, e.g. number of neurons, hidden layers, etc., and normalization of input/output. A 
typical training dataset considered here contains approximately 5 million states. The 
database is first divided into nine equidistant temperature bins, and at least 100,000 
data points are added to each bin to achieve proper sensitivity to temperature in reac- 
tion rate calculations. A typical flame solution would have a large number of points 
in the reactants and the products but not so many within the flame region, and this 
ensures that the ANN is not biased. The inputs and the outputs to the ANN are then 
normalized between +1 and +0.8, respectively, to increase the sensitivity to each 
parameter and remove any bias towards species with higher mass fractions. An 85/15 
training/testing split has been considered to realize the ANN model. The training is 
stopped if there is no improvement in consecutive iterations to avoid overfitting. 
The ANN can have multiple hidden layers, however, a smaller network would 
struggle with predicting complex reaction rate manifolds, whereas a larger network 
would result in a larger number of connections and a higher computational cost. To 
understand this, multiple ANN structures have been considered, and a few repre- 
sentative networks for the chemical mechanism C are summarized in Table 2. The 
corresponding computational speedups, with respect to DI, are plotted in Fig. 3. A sig- 
nificant slowdown occurs beyond 500 connections, and the ANN is even slower than 
DI beyond 20,000 connections. Considering this, and the testing errors in Table 2, 
5/3/2 is selected as the optimal network for this particular kinetics, and it results in a 5 
times speedup with testing errors below 1074. The optimal networks for mechanisms 
A and B are 10/5 and 10/8/4, respectively, and they result in 11 and 35 times speedup 
as compared to the corresponding DI. The larger speedup in mechanism B results 
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Table 2 Number of connections and testing errors corresponding to different ANN architectures. 
The table is reproduced using the data from Sen and Menon (2010) 


Number of neuron in hidden layers logio (Testing error) Connections 
5 —3.521 230 
10 —3.801 340 
5/4 —3.889 358 
10/5 —3.920 500 
20/5 —4.114 710 
5/3/2 —4.201 371 
20/10/5 —4.870 1240 


Fig. 3. Speedup of ANN 
against DI with various 
number of connections. The 
figure is reproduced using 
the digitized data from Sen 
and Menon (2010) 


ANN speedup 
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from its stiffness. The number of training samples was always specified more than 
10 times the number of neurons to avoid overfitting. 

Note that the errors discussed in this section are testing errors based on the dataset 
that was selected for training, and not the actual errors as they would result in a 3D 
application. These errors can occur when thermochemical states, which are accessed 
by the ANN model were not available in the training dataset. Further details about 
these errors are discussed later. 


3.2.5 Modeling Filtered Reaction Rates 


Prediction of œ using ANN was discussed in the previous section, and these can 
be used instead of DI, either for a direct numerical simulation (DNS) or with the 
LEMLES/RRLES approach but within the LEM domain where a turbulence closure 
is not required for the reaction rates. Solution of LEM within each LES cell could still 
be costly for problems of practical interest, and therefore, a modified LES approach, 
referred to as TANN, where the filtered reaction rates © are directly computed using 
ANN was developed (Sen 2009). This approach has similarities with the RRLES 
approach, for instance, subgrid species diffusion Y i is computed using a gradient- 
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diffusion approach, however, instead of using the LEM solver online within each 
cell as the simulation progresses, the filtered reaction rates are trained beforehand. 
The filtered reaction rates for the k™ species are modeled using the ANN model 8, 
through 


(15) 


E O E əf, aY. ay, 
By = Bi (Fr Pores Pno F, Rea, eye *), 


ax’ ax Ox 


Here, Re, corresponds to the subgrid Reynolds number u’ A /v, where A is the LES 
filter size, and u’ = ./2k585/3. Previously described methods for ANN training and 
selection of optimal architecture have also been used with this approach. The ANN 
training database for TANN is constructed using standalone LEM solutions. Initial- 
izing with species and temperature profiles corresponding to laminar flames, a range 
of Re, and L are explored corresponding to the conditions for the 3D application. 
The obtained 1D LEM solutions at multiple time instances are then filtered with size 
A and they are then used for ANN training. 

Since, the velocity field is not available from standalone LEM, Re, cannot be 
computed from u’ or k*®*. For this, an additional equation for kinetic energy k(s) is 


solved on the LEM domain as ak 


evs =P p= E; 

where P, and e€ are turbulence production and dissipation rates, respectively. A local 
velocity disturbance field u™ = v Re, / L is computed on the segment where stirring 
is applied, and this is used as Py = 3/2(u™)? / At and € = (u'™M)3/As to compute 
the production and the dissipation terms, respectively. Here, At and As are the time 
and space discretizations for the LEM domain, and this follows the assumption that 
the turbulence that is modeled by LEM is homogeneous. The evolved k over the 
entire domain is then filtered to compute k*** and Rea. 


4 Example Applications 


In this section, results from the application of different types of ANN-based mod- 
eling strategies discussed in Sect. 3 are described for four test canonical configu- 
rations. These cases correspond to different modes (premixed and non-premixed) 
of combustion and demonstrate the application to configurations with an increasing 
degree of geometric complexity. The first test case is a canonical premixed flame- 
turbulence-vortex interaction configuration where the results are compared for LEM- 
LES between DI, LANN, PANN, and FANN. The second test case corresponds to 
a non-premixed temporally evolving jet flame that exhibits the presence of extinc- 
tion and re-ignition dynamics in the presence of turbulence, and the results using 
LANN-LEMLES and TANN-LES are compared against available DNS data. The 
third test considers a stagnation point reversed flow (SPRF) premixed combustor 
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with LANN-LEMLES and TANN-LES, and finally, the results from a cavity strut 
supersonic combustor obtained using TANN-LES are discussed. The third and the 
fourth tests illustrate application to practical configurations for which the results are 
compared against the available experimental data. 


4.1 Premixed Flame Turbulence 


The test configuration follows a previous work (Sen et al. 2010) for premixed flame- 
turbulence-vortex interaction for syngas/air flame. The reacting flow field is initial- 
ized using a 1D laminar steady solution for premixed flame, and a counter-rotating 
vortex pair is superimposed on the isotropic turbulence to induce small- and large- 
scale wrinkling. The chemical mechanism A is used for this test configuration and 
four different test conditions are considered, which include two equivalence ratios, 
and two values of u’/S;. Here, u’ and Sz; denote turbulence intensity and laminar 
flame speed, respectively. The ratio of integral length scale to the laminar flame thick- 
ness L/L = 5 is selected so that the flame remains in the thin reaction zone regime. 
The maximum induced velocity by the vortex is chosen as Uc max/S_ = 50. A 643 
uniform grid is used with A/n = 4, where n is the Kolmogorov length scale. The 
subgrid 1D LEM domain is spatially discretized using 24 cells. A 10/5 ANN model 
is used for this case. The use of ANN for chemistry modeling while performing 
LEMLES resulted in approximately 11x speedup as compared to DI of the chemical 
kinetics. 

The results for the case with ER = 0.6 and u’/S; = 5 are shown in Fig. 4 at 
t* = L/Uc.max = 5. For the sake of brevity, only spatially averaged profiles of a 
major species H3 and two intermediate species, namely H and O are shown here, but 
the other species also showed a qualitatively similar trend. The model PANN shows 
the highest error with respect to DI even for the major species Hz, where it shows 
an early consumption of the fuel, which can be associated with a faster consumption 
speed, and the errors for PANN are even higher for the minor species. 

The results with the other two models, namely, FANN and LANN are comparable 
to DI for this particular test case, suggesting that both the flame-vortex interaction and 
the standalone LEM are capable of covering a range of thermochemical states that 
are encountered during the 3D flame-turbulence interactions. The same conclusions 
are obtained for the other values of ER and u’/S; as well. These results demonstrated 
both the accuracy and the efficiency of the ANN-based modeling approach for chem- 
istry. Furthermore, the results also highlight the importance of the employed training 
datasets on attaining accurate results. 
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Fig. 4 Comparison of LES results for premixed flame-turbulence-vortex interaction for syngas/air 
at an instance for ER = 0.6 and u'/S;, = 5. The figures are reproduced using the digitized data from 
Sen and Menon (2010) 


4.2 Non-premixed Temporally Evolving Jet Flame 


This computational setup follows a DNS study of turbulent non-premixed syngas/air 
combustion in a temporally evolving jet (Hawkes et al. 2007; Sen et al. 2010). 
An inner fuel jet and an outer oxidizer jet flow in opposite directions, with the 
jet Reynolds number of Rejer = 4478, and a Damkohler number of Da = 0.011. 
While, DNS was performed using 350 million grid points, for LES, 5.5 million 
(A/n = 8.3) cells are used. The 1D LEM domain is discretized using 12 cells. For 
this test case, the chemical mechanism C has been considered. Here, the results from 
LANN-LEMLES and TANN-LES are discussed. In terms of the computational cost, 
LANN-LEMLES provided a 5.5 times speedup compared to DI-LEMLES, whereas, 
TANN-LES provided 18.3 times speedup, showing a significant computational gain. 

The time variation of mean temperature at stoichiometric mixture fraction is 
shown in Fig. 5. The temperature is expected to be the maximum on the stoichio- 
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metric surface for a non-premixed flame. The initially stable non-premixed flame 
approaches extinction as a result of the shear-generated background turbulence, and 
the temperature decreases from an initial 1450 K to 1100 K at a non-dimensional 
time t; = 20 in DNS. After this time instant, the temperature starts increasing again 
as a result of the re-ignition process, and finally reaches up to 1300 K at t; = 40, 
close to its initial value. These global features are captured by both LANN-LEMLES 
and TANN-LES, with 5-10% error near extinction. 

The contours of mass-fraction of OH species in the central x — y plane are shown 
in Fig. 6 at time instances t; = 20 and t; = 40 obtained from DNS and LANN- 
LEMLES cases. The OH mass fraction from DNS peaks along with the shear layers, 
showing a broken structure due to local extinctions at t; = 20, but this is followed 
by re-ignition at t; = 40 within these pockets. Qualitatively, the features observed 
in the DNS case are also captured in the LANN-LEMLES case. 

Mass-fractions and temperature statistics in the compositional space were also 
analyzed for a quantitative comparison of the flame structure by different models. 
The variation of OH mass fraction is shown in Fig. 7 at t; = 20 and t; = 40. Results 
with all, DNS, LANN-LEMLES and TANN-LES drop below the laminar flamelet 
value at extinction at t; = 20, and shoot back up above it at t; = 40 confirming 
re-ignition. Both LANN-LEMLES and TANN-LES are able to predict this behavior 
and match the DNS data with reasonable accuracy, with TANN-LES providing a 
slightly better match, particularly during the extinction phase. 

Overall, the results presented here demonstrate the robustness of the ANN-based 
modeling of chemistry. This test case is particularly challenging because of the 
presence of the unsteady dynamics of turbulence-chemistry interaction, which is 
marked by the presence of extinction and reignition events. 
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(a) DNS, t; = 20 (b) LANN-LEMLES, t; = 20 


(c) DNS, t; = 40 (d) LANN-LEMLES, t; = 40 


Fig.6 Contours of OH mass fraction in the central plane att; = 20 andt; = 40 obtained from DNS 
(a, c) and LANN-LEMLES (b, d) cases for the temporally evolving non-premixed jet configuration. 
The figures are borrowed from Sen et al. ( ) 


4.3. SPRF Combustor 


The stagnation point reversed flow (SPRF) combustor (see Fig. 8) was designed 
to reduce emissions (Gopalakrishnan et al. ; Undapalli et al. ). It was 
simulated in a premixed mode configuration for evaluating the capabilities of the 
LANN-LEMLES and the TANN-LES approaches (Sen ). Methane/air mixture 
is injected into the combustor at an equivalence ratio of 0.58. The flow enters and 
leaves the combustion chamber in the same plane, providing extensive preheating 
and allowing the flame to stabilize at very lean conditions. The combustion chamber 
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Fig. 7 Conditional average of You at t” = 20 and r* = 40 for non-premixed extinction re-ignition 
test. The symbols have the same meaning as Fig. 5. The figures are reproduced using the digitized 
data from Sen et al. (2010) and Sen (2009) 


marked as region (5) has a wall (6) at the end. Surface (2) is closed and (3) injects 
the premixed mixture, with (4) as the outflow. The annular jet bulk flow velocity 
is 122 m/s, and it is preheated to 500 K, with Re = 12900. The computational 
domain is spatially discretized using approximately 1.2 million cells. The methane/air 
mechanism B is used for this test configuration. For the ANN model, Re, varying 
from 10 to 400, and the integral length scale L as the radius of the whole injector 
assembly (L = 8.25 mm) are considered. In terms of computational cost, LANN- 
LEMLES and TANN-LES showed 49.2 and 134.9 times speedup, respectively, as 
compared to DI-LEMLES for this test configuration. 

The simulation results using DI-LEMLES, LANN-LEMLES and TANN-LES 
were time-averaged over two flow-through times and compared against experimental 
data along the centerline as shown in Figs. 9 and 10. Both LANN-LEMLES and 


70mm 


a=6.35mm b=4.1mm c=12.5mm d=16.45mm 


Fig. 8 Schematic of the stagnation point reversed flow combustor. This figure is borrowed from 
Undapalli et al. (2009) 
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Fig. 9 Axial variations of time-averaged temperature and axial velocity for the SPRF combustor. 
This figures are reproduced using the digitized data from Sen (2009) 
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Fig. 10 Axial variations of time-averaged mass fraction of CH4 and CO) for the SPRF combustor. 
This figures are reproduced using the digitized data from Sen (2009) 


DI-LEMLES are able to capture the far-field axial velocity variation accurately. The 
differences near the injector could be due to differences in the boundary conditions 
as discussed elsewhere Sen (2009). The same holds true for temperature, CH4, and 
CO; centerline variations, the results show approximately 10% errors with respect to 
the experiments, but both LANN-LEMLES and DI-LEMLES show similar results. 

The centerline time-averaged variations for axial velocity are worse for TANN- 
LES as compared to LANN-LEMLES, whereas they are better for temperature, CH4, 
and CO, with respect to the experiments. It was hypothesized that this could be due 
to differences between the use of LEM in LEMLES and TANN-LES, where, the 
eddy-sizes are restricted between 7 and A in the former, whereas they are between 
n and L in the latter, that could result in a higher wrinkling of the flame front and 
increased turbulence within the combustor. 
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The training of the ANN model using the 1D LEM dataset and subsequent use of 
the model while performing LES of a practical configuration again demonstrates effi- 
ciency, robustness, and generality aspects of the approach. The observed differences 
from the reference results, particularly with the TANN-LES need further studies so 
that the accuracy of the approach can be enhanced further. Some of these studies are 
currently underway. 


4.4 Cavity Strut Flame-Holder for Supersonic Combustion 


Now, the results from LES of a cavity-based flame-holder are discussed (Ghodke 
et al. 2011). Two configurations, as shown in Fig. 11, were considered; baseline 
cavity with 11 injectors on the aft ramp (no strut), and a strut positioned upstream 
of the cavity with 6 fuel injectors (with strut). The cavity extends 153 mm in the 
spanwise direction, with 90° leading edge and 22.5° ramp at the trailing edge. The 
cavity is 16.5 mm deep with L/D = 2.79, and the length of the cavity floor is 
46 mm. The injected fuel mixture contains 70% methane and 30% hydrogen, whereas 
the mainstream contains air and water vapor at a Mach number of 2. 

The computational grids for both configurations contained approximately 10 mil- 
lion cells, with clustering in the near-wall regions, shear layers, and near the fuel 
injectors. A reduced four-step methane-hydrogen kinetics was used (Peters and Kee 
1987) for the simulations. The ANN model for TANN-LES was trained using the 
previously described method, and a 10/8/4 hidden layer structure was found to be 
optimal. Simulations were performed for a duration of 6 flow-through times, and 
the results are compared between experiments, DI-LEMLES and TANN-LES. Com- 
pared to DI-LEMLES, TANN-LES was around 50 times faster for both the no-strut 
and strut configurations. 

Figure 12 shows instantaneous temperature field contours on a plane that is normal 
to the spanwise direction for both the configurations. Most of the cavity region is 
filled with hot products, which causes lifting of shear layer for oxidizer entrainment 
into the cavity. The reaction zone is even larger for the configuration with the strut 
due to an increased mass and heat transfer between the cavity and the main stream, 


156.4 mm 
83 mm 


Fig. 11 Schematic of a supersonic cavity-strut flame-holder. The figure is borrowed from Ghodke 
et al. (2011) 
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(a) No strut, DI-LEMLES (b) No strut, TANN-LES 


(c) Strut, DI-LEMLES (d) Strut, TANN-LES 


Fig. 12 Temperature contours on a center-slice at an instant for the supersonic cavity strut flame- 
holder. The figures are borrowed from Ghodke et al. (2011) 
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Fig. 13 Bottom wall pressure comparison against experimental data (Grady et al. 2010) for the 
supersonic cavity strut flame-holder. The strut extends from x = —36 mm to x = 25 mm, and the 
cavity extends from x = 0 mm to x = 86 mm. The figures are reproduced using the digitized data 


from Ghodke et al. (2011) 


as a result of the low-pressure region behind the strut. Vortical structures behind the 
strut are responsible for better mixing of fuel and maintaining hot regions inside 
combustor by mass transfer which helps flame-holding. 

Figure 13 shows the wall pressure comparison for reacting cases with available 
experimental data (Grady et al. 2010). For both cases, location of leading-edge shock 
(x ~ —30 mm and x ~ 0 mm for configurations with and without strut, respectively) 
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and ramp expansion (x ~ 85 mm) are captured well, along with multiple reflections 
off the wall. The pressure inside the cavity is almost constant, and hence, this could be 
considered as a constant pressure combustion process. The peak pressures along the 
wall as predicted by both DI-LEMLES and TANN-LES are also in good agreement 
with reference experimental data, thus illustrating that the heat releases effects are 
accurately captured. 

The use of an ANN-based strategy for modeling subgrid turbulence-chemistry 
interactions in this test configuration demonstrates the robustness of such an approach. 
This could be attributed to the efficacy of ANN to accurately represent multi- 
dimensional data in form of a nonlinear regression, which in turn, can account 
for complex input/output relations as prevalent in this particular test case where 
turbulence-chemistry interactions occur under supersonic flow conditions in a com- 
plex geometrical configuration. Although the approach employed here is able to 
capture the trends both qualitatively and quantitatively, some discrepancies with the 
experimental data can also be seen, which needs further investigation. 


5 Limitations of Past Studies 


The results discussed here used ANNs to directly represent the chemical kinetics at 
the subgrid level. Even though the results demonstrated various aspects of the ANN- 
based modeling approach for efficient computations of chemically reacting flows 
while utilizing FRC, there are certain challenges that need further studies. Some of 
the key features of ANN-based modeling that were demonstrated include a signif- 
icant decrease in the computational cost and memory requirements, robustness in 
application to different modes and regimes of combustion, predictive ability in terms 
of decoupling the training dataset from the actual application, etc. Some limitations 
and concerns of the current work are highlighted next in order to stimulate future 
research: 


e Stiff kinetics with complex mechanism: The majority of configurations in the 
current work explored mechanisms comprising of 11—16 species with varying lev- 
els of stiffness. The results discussed provided consistently acceptable predictions 
for all cases. However, scenarios relevant to practical combustion applications 
involve detailed kinetics with an order of 50-100 species and an order of 1000 
reactions. Hence, the scalability of the current approach to such a scenario needs 
to be investigated further with specific attention to predictions of minor species, 
stiff radicals, and their interactions with turbulence. 

e ANN architecture optimization: Even though a significant portion of the cur- 
rent work is focused on devising an optimal neural network for a general case, 
the sensitivity of the ANN to hyperparameters such as the number layers, size 
of the training data generated, effects of using different optimization approaches, 
activation functions as well as error estimation techniques need to be further exam- 
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ined, especially with help of open-source well-established powerful tools such as 

TensorFlow (Martin et al. 2015) or PyTorch (Paszke et al. 2019). 

Training data generation: For TANN, the filtered training data is generated by 

using the filter width based on the information of the actual LES grid. For canonical 

problems, this information is easily available as the computational grid involved is 
almost uniform throughout the domain. However, in complex configurations, since 
the computational grid varies significantly due to clustering in specific areas of 
interest, the filter width definition needs to be revised. Moreover, the ANN model 
trained on the table generated using the standalone LEM computations still suffers 
from the assumption made during the LEM approach used for table generation. 

Therefore, some assumptions involved in standalone LEM computations regard- 

ing turbulence homogeneity and isotropy, LEM solution initialization using a 1D 

laminar flame solution, stirring operations, etc. need to be revisited for further 
improvements. 

Off line training: The ANN training approach adopted here is based on offline 

training philosophy. The training dataset generated using 1D LEM, flame-vortex 

interaction, or 1D laminar flame are used to construct the thermochemical database, 
and once the ANN model is trained on this, it is used as-is in LES without any 
further learning. Therefore, it is expected that it may contribute to large errors at 
some state spaces that are far from training data. The alternate method of combining 
offline and online training, where ANN needs to be retrained on such states, can 
be adopted. A similar strategy has been employed in a recent DNS study (Chi et al. 

2021). 

e Subgrid modeling of scalar fluxes: In the TANN approach, a gradient diffusion- 
based eddy diffusivity model is used for the closure of the SGS scalar flux. How- 
ever, in chemically reacting flows both gradient and counter-gradient subgrid tur- 
bulent transport of the scalar fields are observed (Ranjan et al. 2016). Therefore, 
the predictive capabilities of the TANN-LES strategy can be further improved by 
using ANN-based modeling of the SGS scalar flux. 


6 Summary and Outlook 


Rapid advancements in computational resources have led to an increased usage of 
ML tools, particularly supervised DL to solve challenging problems in the field of 
science and engineering. DL techniques relying on the ANN is a representational 
learning method, which transforms the representation at one level starting with the 
raw input to an abstract representation at a higher level, which allows learning of 
complex nonlinear relationships and enables the discovery of intricate structures in 
a high-dimensional dataset. In this chapter, different approaches relying on ANN 
algorithms for efficient modeling of the chemistry. Within the FRC framework have 
been discussed for LES of turbulent combustion. 

The two major challenges associated with FRC-LES include a robust SGS closure 
for turbulence-chemistry interaction and efficient handling of stiffness associated 
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with the use of detailed chemical kinetics. In the LEMLES approach, a two-scale 
strategy is used; LEM is used for the subgrid modeling of the reaction, diffusion, 
and turbulent mixing, and large-scale transport is handled in a Lagrangian manner. 
The approach has been demonstrated in the past for simulations of a wide variety of 
canonical and practical configurations. As it allows for the inclusion of arbitrarily 
complex chemical kinetics and resolves the flame in the 1D LEM domain, ANN- 
based models have been examined in terms of their ability to efficiently model the 
reaction-rate terms. Apart from LEMLES, a conventional LES approach has also 
been discussed where instead of modeling the reaction-rate term at the subgrid level 
as in LEMLES, a model for the filtered-reaction-rate term is devised based on ANN. 

A key step in ANN-based modeling is the training database, which was generated 
using three approaches, namely, laminar flame solutions, flame vortex interactions 
(FVD), and flame turbulence interactions (FTI) using standalone 1D LEM compu- 
tations. In all three approaches, the thermochemical state space is predicted using 
canonical configurations and with only the knowledge of large-scale parameters of 
the actual geometry of interest. The ANN model trained using these three approaches 
showed the effectiveness of FTI (LANN) and FVI (FANN) approaches over laminar 
flame solutions (PANN) for training data generation and predicting the behavior of 
canonical as well as complex (premixed and non-premixed modes) reacting flow con- 
figurations. The TANN approach utilizes a tabulation model for the filtered reaction 
rates, which does not employ any explicit assumption regarding the interaction of 
turbulence with the laminar flame front, but solves them directly on their respective 
time and length scales using standalone LEM computations. The ANN models con- 
sidered in the example applications were based on a back-propagation algorithm with 
adaptive gradient descent rule (AGDR), and tanh activation function with a simple 
architecture using a maximum of 3 hidden layers, one input, and one output layer. 
Furthermore, during the learning stage of the ANN model, the training was stopped 
when saturation in training error was observed to ascertain the ANN generality and 
avoid problems of data memorizations. 

The performance of ANN-based modeling strategies was examined in terms of 
their accuracy, robustness, and efficiency using four test cases with an increasing 
degree of complexity. These cases included canonical turbulent premixed and non- 
premixed flames where reference DNS results were used to assess the capabilities of 
different modeling approaches. The robustness of the use of the ANN model for FRC 
was demonstrated through two practical configurations corresponding to a premixed 
combustor and a supersonic cavity flame holder. These cases were simulated using 
three different chemical mechanisms. Overall, ANN-based modeling of chemistry 
with the LEMLES and TANN-LES framework was able to capture qualitative fea- 
tures of flame-turbulence interactions, and their quantitative statistics were in good 
agreement with direct integration approaches for chemistry. However, some discrep- 
ancies were also noted in the results, which needs further investigation for potential 
improvement to the employed modeling strategies. 

A major challenge with modeling of chemistry using ANN is an accurate repre- 
sentation of detailed chemical mechanisms over a wide range of operating conditions, 
which usually have a higher level of stiffness due to the wide separation of timescales 
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associated with different chemical species. So far, while using ANN with LEM only 
moderately complex chemical mechanisms have been considered, which need to 
be extended to detailed chemical mechanisms. While modeling FRC using ANN, 
a multi-input and single-output ANN model is needed for each chemical species, 
which also poses a challenging task for the training process to attain an optimal 
architecture. To obtain an optimal ANN model, parameters, hyperparameters, and 
training strategies need to be specified. While some of the hyperparameters have 
demonstrated their applicability to different types of problems, further usage and 
assessment of ANN algorithms for turbulent combustion modeling can potentially 
lead to some common parameters that may work for a wide range of applications. 

Another key challenge for ANN-based predictive modeling is the efficient gen- 
eration of reliable training data. The data generation procedure should be general 
enough so that it can be used with different types of geometrical configurations, and 
different modes and regimes of combustion. Furthermore, the procedure should be 
efficient to enable a faster generation of training data for a range of input conditions 
that can cover a large thermochemical state space. To this end, 1D LEM-based train- 
ing seems to be a good strategy, however, further improvements are needed. Some 
improvements that should be considered are: accounting for the effects of pressure, 
the use of different types of energy spectra in the LEM equations, considering a range 
of LES filter sizes, etc. In addition, an adaptive training approach (Chi et al. 2021) 
can also be considered by employing a cost function associated with the accuracy 
and efficiency of the ANN model. 

The ANN model for reaction rate discussed in this chapter relied on a different 
network for each species. However, reaction-rate for the species are related to each 
other through the constraint of conservation of mass. This aspect is not addressed in 
the formulation considered here, and therefore, can be considered in future studies 
by following the approach used by the physics-informed neural network (Raissi et al. 
2019). Although turbulent combustion modeling in the context of LES has mainly 
focused on robust and accurate modeling of the filtered reaction-rate term, ML tools 
can also be used for modeling the other unclosed terms such as SGS scalar flux, 
temperature, equation of state, etc. Such constraints and improvements by the use 
of ML tools can yield improved predictions, particularly under extreme conditions 
when large variations in thermochemical state space can occur, and therefore, should 
be considered in future studies. 
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Abstract The application of machine learning algorithms to model subgrid-scale 
filtered density functions (FDFs), required to estimate filtered reaction rates for Large 
Eddy Simulation (LES) of chemically reacting flows, is discussed in this chapter. 
Three test cases, i.e., a low-swirl premixed methane-air flame, a MILD combustion 
of methane-air mixtures, and a kerosene spray turbulent flame, are presented. The 
scalar statistics in these test cases may not be easily represented using the commonly 
used presumed shapes for modeling FDFs of mixture fraction and progress variable. 
Hence, the use of ML methods is explored. Particularly, deep neural network (DNN) 
to infer joint FDFs of mixture fraction and progress variable is reviewed here. The 
Direct Numerical Simulation (DNS) datasets employed to train the DNNs in each 
test case are described. The DNN performances are shown and compared to typical 
presumed probability density function (PDF) models. Finally, this chapter examines 
the advantages and caveats of the DNN-based approach. 
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1 Introduction 


Increasingly stringent regulations on pollutants emissions from fossil fuel combus- 
tion are demanding for novel combustion technologies which can have high fuel 
flexibility, increased efficiency and low emissions. Moreover, a significant adoption 
of renewable technologies in future years is expected to reduce carbon footprint 
and meet the long-term objective of CO7 neutrality. Nevertheless, combustion-based 
energy technologies will play a role in the future (or low-carbon) energy mix as 
discussed in the chapter “Introduction”. Hence, combustion research is called in 
to provide solutions to the expected challenges arising from issues related to fuel 
flexibility and improving efficiency with pollutants reduction. Current combustion 
studies focus on aspects such as development, validation and uncertainty quantifi- 
cation of new models, and involve either experiments or numerical simulations, or 
both. A collection of these studies represents a massive amount of data that can be 
leveraged to achieve significant progress in combustion science. Utilising this data 
has thus become a new challenge and research opportunity. Data-driven techniques 
such as machine learning (ML) have demonstrated their abilities to extract informa- 
tion from massive data and assist in developing novel models which can be leveraged 
for technology development. 

Machine learning techniques allow us to have statistical inference, for some 
unknown quantities of interest, with reasonably accuracy and confidence by carefully 
training the algorithms using representative data. Since the 1990s, ML has regained 
increasing attention and achieved outstanding results in many areas (Jordan and 
Mitchell 2015), including science, technology, manufacturing, finance, education, 
health care, and many more. Combustion science is not an exception to this trend, 
there are many studies demonstrating successful use of ML for combustion and some 
of these studies date almost 30 years back. Christo and coworkers (Christo et al. 1995, 
1996b, a) first employed a machine learning algorithm, namely the Artificial Neural 
Network (ANN), in the 1990s to deal with chemistry tabulation for turbulent com- 
bustion simulations. These works involved training an ANN to obtain changes in 
the composition of several reactive scalars rather than using the conventional direct 
integration of the relevant equations. Satisfactory results suggested that the ANN 
was able to provide, with computational efficiency, the chemical kinetics informa- 
tion required for turbulent combustion simulations. The computational efficiency 
was mainly noted to come from memory saving. The subsequent studies extended 
this novel approach to more complex chemical systems (Blasco et al. 1998, 1999; 
Chen et al. 2000), where multiple ANNs were proposed for different subdomains of 
the large composition space. The valuable time saving achieved by ANN compared 
with traditional methods was presented. The recent advances on ML applied to chem- 
ical kinetics are discussed in chapters “Machine Learning Techniques in Reactive 
Atomistic Simulations” and “Machine Learning for Combustion Chemistry” with 
different perspectives. 

Blasco et al. (2000) employed two different ANNs, namely the Self-Organising 
Map (SOM) and the Multi-layer Perceptron (MLP), to estimate the thermochemical 
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states during a combustion simulation. The SOM was used to partition the thermo- 
chemical space into subdomains, while several MLPs were trained on each sub- 
domain to predict the evolution of the thermochemical space in time. These early 
explorations identify a general route to utilise the ANN for chemistry tabulation 
approaches, although their generality was limited due to the similarity between train- 
ing and testing cases. Consequently, later studies focused on developing ANNs for 
a wider range of combustion conditions. 

Sen et al. trained ANNs using unsteady flame-turbulence-vortex interaction cases 
and subsequently used them for Large Eddy Simulations (LES) of syngas/air flames 
quite successfully (Sen and Menon 2009; Ali Sen and Menon 2010; Sen et al. 2010). 
Zhou et al. demonstrated successful application of the ANN to turbulent premixed 
flames by including 1D laminar premixed flame cases at different turbulent inten- 
sities while training the ANN (Zhou et al. 2013). A wider range of combustion 
conditions were also considered in later studies by including non-premixed laminar 
flamelets (Chatzopoulos and Rigopoulos 2013) to include local extinction and reig- 
nition (Franke et al. 2017) and non-adiabatic conditions (Wan et al. 2020, 2021) 
in the training data sets. Furthermore, randomising the non-premixed flamelets 
before using them as training data sets were shown to improve the generality of 
the ANN and helped to capture the behaviour of turbulent premixed flames quite 
well (Readshaw et al. 2021; Ding et al. 2021). Also, other techniques were explored 
to improve the generalisation level of ANN: Chi et al. (2021) trained the ANN on- 
the-fly during a simulation, whereas An et al. (2020) trained their ANN using data 
from Reynolds-averaged Navier-Stokes (RANS) simulations of hydrogen/carbon 
monoxide/kerosene/air mixture in a rocket combustion chamber and tested it for 
LES. 

Further to the chemical kinetics use, another application of the ANN focuses on 
replacing the traditional flamelet look-up table, which requires a large memory. The 
general procedure is to set thermochemical scalars, which are the basis of the look-up 
table, as the input of the ANN and to infer the tabulated values. This reduces the 
memory requirement significantly since only the weights and bias(es) of the ANN 
need to be saved. A first successful application was demonstrated in Flemming et al. 
(2005) by building ANNs having the mixture fraction, its variance and its scalar 
dissipation rate as inputs and mass fractions as outputs, and using them in LES 
of the Sandia flame D. This was extended in Kempf et al. (2005) and Emami and 
Fard (2012) to estimate scalar mass fraction variations in a turbulent CH4/H2/N; jet 
diffusion flame. The optimisation of the ANN architecture, in terms of number of 
hidden layers and neurons per layer, was also explored to improve the predictive 
accuracy of LES of the Sydney bluff-body swirl-stabilised methane-hydrogen flame 
(Ihme et al. 2006, 2008, 2009). 

The use of ANN for inferring multi-dimensional flamelet library is also explored 
in recent studies. Owoyele et al. proposed a grouped multi-targets ANN approach 
to model 4D and 5D flamelet libraries respectively for a n-dodecane spray flame, 
under conditions of the Spray A flame from the Engine Combustion Network (ECN), 
and methyl decanoate combustion in a compression ignition engine (Owoyele et al. 
2020). Ranade et al. (2021) trained a SOM-MLP method on a 4D Probability Density 
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Function (PDF) table and used it for RANS and LES of the DLR-A turbulent jet dif- 
fusion flame. These works showed that the ANN yielded good accuracy at reduced 
computational costs with low storage space requirements. Similarly, Zhang et al. 
(2020) extended the application of the SOM-MLP algorithm to the Flamelet Gener- 
ated Manifolds (FGM) model by using species mass fractions in mixture fraction- 
progress variable space as training data. This ANN approach was successfully used 
in RANS calculations and LES of ECN Spray H flame to explore the detailed spray 
combustion process. More comprehensive reviews of the applications of ML in com- 
bustion research can be found in Zheng et al. (2020), Zhou et al. (2022) and Ihme 
et al. (2022). 

Presumed PDF shapes are typically used along with tabulated chemistry 
approaches. The PDF of relevant scalars such as mixture fraction and progress 
variable are used to compute averaged temperature, density, species mass fractions, 
and the relevant reaction rates. These quantities can be stored in a look-up table with 
the first two moments of the above scalars as controlling variables. Although widely 
employed in several past studies, presumed PDF or Filtered Density Function (FDF), 
in the context of LES, approaches may not accurately represent the scalar statistical 
behaviour under several conditions, such as extinction and reignition, combustion 
among multiple streams, multi-regime burners, and multi-phase reacting flows. The 
FDFs having shapes different to the regular distributions such as Gaussian or £- 
function can be also observed prominently in Moderate or Intense Low-oxygen Dilu- 
tion (MILD) combustion. This combustion mode features broadly distributed reac- 
tion zones rather than conventional flamelet-like structures, with strong interactions 
between autoigniting and propagating fronts. Therefore, it may not be satisfactory 
to use conventional PDFs/FDFs models to predict reaction rates, and advanced data- 
driven techniques like machine learning may be a suitable alternative for improving 
the accuracy. De Frahan et al. (2019) compared the performance of three different 
machine learning techniques, viz., random forests, which is a traditional ensemble 
methods, deep neural networks (DNNs), and conditional variational autoencoder 
(CVAE), multiple hidden layers between which is also know as generative learn- 
ing, to infer marginal FDFs of reaction progress variable in a swirling methane/air 
premixed flame and showed that DNN is superior compared to the other two tech- 
niques. The DNN is an ANN with multiple hidden layers between input and output. 
Yao et al. (2020) built an MLP to obtain the mixture fraction marginal FDF for 
LES of turbulent spray flames and observed an order of magnitude improvement 
compared to those of the traditional presumed FDF approaches. Chen et al. (2021) 
employed a DNN to predict the joint FDF of mixture fraction and progress vari- 
able in MILD combustion conditions and showed that the DNN is generally able to 
capture the complex FDF behaviours and their variations with excellent accuracy, 
outperforming other presumed FDF models. 

This chapter aims to provide an overview of recent studies employing deep neural 
networks (interchangeably referred to as DNN, ANN or MLP hereafter) to infer 
subgrid-scale FDFs and reaction rates needed for LES of turbulent combustion under 
conventional and MILD conditions. A review of the Direct Numerical Simulation 
(DNS) data used to train these DNNs is also given. The chapter is structured as 
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follows. A recap of the treatment of FDFs in LES of turbulent combustion systems 
is provided in Sect.2. The DNS cases used as training datasets for the DNNs are 
described in Sect.3. The characteristics of the DNNs employed for the different 
combustion cases are illustrated in Sect.4. The main results in terms of FDF and 
reaction rate predictions are discussion in Sect.5. The conclusions are summarised 
in Sect. 6. 


2 FDF Modelling 


The filtered reaction rate appearing in the transport equation for a species filtered mass 
fraction or reaction progress variable needs a closure model and recent developments 
in various closure models are described in the book (Swaminathan et al. 2022) and 
review papers (Veynante and Vervisch 2002; Pitsch 2006). Earlier chapters of this 
book discuss the potential application of ML techniques to some of the reaction rate 
closures. In the presumed PDF approach, the filtered reaction rate is modelled as 
an integral of the product of a conditional reaction rate and a FDF (see Eq. 6). The 
mixture fraction and the reaction progress variable are typically used as conditioning 
variables to signify the role of mixing and flame propagation on reaction rate (Bradley 
et al. 1998; Ihme and Pitsch 2008a). The conditional reaction rate may be estimated 
using one of the methods developed in past studies and these methods used canonical 
flames for chemistry tabulation, e.g., flamelet-generated manifolds (van Oijen and 
de Goey 2002), flame prolongation of intrinsic low dimensional manifold (Gicquel 
et al. 2000), conditional source term estimation method (Jin et al. 2008), or the 
solution of conditionally filtered equations for species mass fractions and energy via 
the conditional moment closure method (Klimenko and Bilger 1999). 

The subgrid variations in the conditioning variables about their filtered values 
are represented by the filtered density function (FDF). The FDF can generally be 
obtained by solving its transport equations using various approaches, e.g., Lagrangian 
particles (Pope 1985), Eulerian stochastic fields (Jones and Kakhi 1998), and multi- 
environment (Fox 2003). However, these approaches are computationally expensive 
and thus using a presumed FDF can be chosen (Pitsch 2006; Pope 2013) to save 
computational costs. This presumed FDF approach will need only the statistical 
moments, usually the mean and variance, of the key variables (mixture fraction, 
progress variable, flame stretch/straining, heat loss, etc., depending on the physical 
scenario of interest) to be transported and it is therefore much more economical. 

The B-PDF (Cook and Riley 1994) is the most commonly used presumed FDF 
in LES of turbulent flames (Raman et al. 2005; Navarro-Martinez et al. 2005; Ihme 
and Pitsch 2008b; Chen et al. 2017), and it usually provides a good approximation 
of a conserved scalar distribution. The Favre-averaged FDF of the mixture fraction 
Z with a presumed f-distribution is calculated as 


~ ~ r b 
P(E; Z,07) = wt eta —é), (1) 
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where £ is the sample space variable for Z, Z is the filtered mixture fraction and 
ož =% = (Z— Z)? is the mixture fraction subgrid variance. The parameters of 
the T function are a = Z (1/gz — 1) and b = (1 — Z ) (1/gz — 1). The segregation 
factor is gz = m / (Z d- Z)). The Favre-filtered FDF of the progress variable, 
Ps (n; ©, 62), can also be presumed to follow a 6 distribution and obtained in a 
similar manner using ¢ and o2 =cl= (c — ¢)*. The joint FDF of £ and n can be 
modelled as 7 z eee - 

B Gn) = Pp (8: 2.03) Pp (n:202), 2) 


assuming that there is a weak correlation between the subgrid fluctuations of Z and 
c. Such assumption has been widely accepted for LES of conventional combus- 
tion (Pitsch 2006; Veynante and Vervisch 2002). However, stronger subgrid correla- 
tions of scalars fluctuations can occur in MILD combustion (Minamoto et al. 2014) 
and hence the above assumption may not applicable universally. Other analytical 
distributions have been considered in past studies (Grout et al. 2009; Darbyshire 
and Swaminathan 2012; Linse et al. 2014). Darbyshire and Swaminathan (2012) 
proposed a correlated joint PDF model using the Plackett copula (Plackett 1965) to 
include the covariance of Z and c in RANS calculations. The covariance, oze, written 


—~_—_ 


as Gz. = (Z — Z) (c — ©) is used in the copula method to obtain a joint PDF from 
the univariate marginal distributions, Pg(Z) and Pg(c). For non-zero values of oz, 
the correlated joint PDF is calculated as 


0 Pg(Z) Pg(c) (of — 2B) 


P(Z,c)= l 3 
ai (of? — 40.2)" = 
with 7 a 
A =14+ 6-1)[6(Z) + Go], (4) 
and 7 7 
B= (0 —1)€(Z)Ep(c), (5) 


where Cs is the 6 cumulative distribution function (CDF) and 0 is the odds ratio 
calculated using a Monte Carlo approach (Ruan et al. 2014). The copula method has 
been used in RANS calculations of stratified premixed and lifted jet flames (Ruan 
et al. 2014; Chen et al. 2015) showing improved prediction of the lift-off height with 
respect to the double-68 PDF given in Eq. (2). 

In presumed-FDF approaches, the subgrid reaction rate is obtained as 


1 pl Dis = 
a= f T (|Z, )P (Z, 3 Z, 02, o2) dZ de, (6) 
0 0 


and this approach reduces the computational cost significantly for LES by using pre- 
sumed FDF in the above equation. However, the presumed FDF shapes obtained using 
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classical functions, for example bimodal delta function, may not be fully satisfactory 
for situations such as (i) MILD combustion conditions, (ii) when there are evaporat- 
ing droplets, and (iii) when the burnt or burning mixture is inhomogeneous leading 
to significant statistical correlation between Z and c (Chen et al. 2018). To overcome 
these issues, machine learning algorithms are employed to construct predictive mod- 
els for the scalar PDFs/FDFs in recent studies. A deep neural network (DNN), among 
other ML techniques tested, was shown to be better than a joint 6-function model in 
inferring subgrid FDFs in a swirling methane-air premixed flame (de Frahan et al. 
2019). This behaviour was also demonstrated for MILD combustion (Chen et al. 
2021) and turbulent spray flames (Yao et al. 2020). These tests were conducted using 
respective direct numerical simulation (DNS) datasets. DNS can be seen as a vir- 
tual experiment resolving all the relevant length and time scales without turbulence 
modelling. Thus, it is a powerful tool for investigating combustion models. It is quite 
straightforward to obtain filtered quantities from DNS data by applying appropriate 
filtering operations (Pope 2000) and these can be used as input to ML algorithms 
such as DNN. The data extraction and its processing prior to using them for DNN 
training are important steps which can play a role to improve accuracy and generality 
of the neural networks. Details about these steps, along with the main features of the 
cases studied in de Frahan et al. (2019), Chen et al. (2021) and Yao et al. (2020), 
are discussed in the following sections. Details on the respective DNS cases can be 
found in those studies as the objective here is on the use of ML techniques. 


3 DNS Data Extraction and Manipulation 


Three combustion cases are considered in this chapter: a low-swirl premixed 
methane-air flame investigated in de Frahan et al. (2019), methane-air combustion 
under MILD conditions studied in Chen et al. (2021), and a turbulent kerosene spray 
flame used in Yao et al. (2020). The corresponding DNS setups and data preparation 
procedures are described next. 


3.1 Low-Swirl Premixed Flame 


The DNS dataset considered by de Frahan et al. is a snapshot of a quasi-stationary 
simulation of an experimental low-swirl, premixed methane-air burner (Day et al. 
2012). In this setup, a nozzle imposes a low swirl to a CH4/air mixture with fuel-air 
equivalence ratio @ = 0.7 at the inflow. The nozzle region is surrounded by a co- 
flow of cold air. A lifted premixed flame with its partially burnt mixture reacting with 
co-flow air in downstream locations was observed in the experiments. The presence 
of this multi-regime burning introduces challenges for modeling the joint FDF of 
mixture fraction and progress variable. Training ML models with such DNS dataset 
has additional advantages such as using diverse subsets as training data, avoiding 
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overfitting, and increasing the opportunities for model generalisation. The training 
sets were constructed by selecting different subvolumes, indicated by V as in Fig. 1, 
spanning from premixed combustion region to downstream zone containing mixing 
of premixed combustion products with co-flow air. de Frahan et al. (2019) used a 
single time snapshot at t = 0.0626 s from the DNS to demonstrate the capabilities 
of ML for FDF modelling. In the context of LES, the FDF at a given point and time 
can be extracted by applying fine-grained filtering to DNS or experimental data at a 
given instant (Pope 1990). In each subvolume, sample moments and the associated 
FDF were thus obtained by using a discrete box filter: 


nz /2 nz /2 nz /2 


Y> dS DS wa tiAx, y+ jAx,zt+kAx), (1) 


3 
Í i=—ny/2 j=—n5/2k=—n yf /2 


= 1 
Yx, y,z) = = 


= 


where 7 is the quantity of interest, n ¢ is the number of points in the discrete box filter, 
A = 32Ax is the filter size, and Ax = 100 um is the smallest spatial cell size in the 
DNS (six times smaller than the laminar flame thickness). Four sample moments 
of the joint FDF, i.e., Z, ae, T, G2, which are Favre-filtered mixture fraction, its 
subgrid scale (SGS) variance, progress variable and its SGS variance, were extracted 
for each subvolume. The filter size was chosen to be representative of typical LES 
filter scale (Pitsch 2006) and to ensure adequate samples to construct FDF. These 
filters were spaced equidistant of 8Ax, leading to 58800 FDFs for each subvolume. 
The mixture fraction Z was defined using nitrogen mass fraction so that it took 
a value of 1 in the burner stream and 0 in the co-flow air. The progress variable, 
varying between 0 and 0.21, was defined using mass fractions of CO2, CO, H20 and 
Hz as c = Yco2 + Yoo + Ymo + Yuo. The density-weighted FDFs of Z and c were 
constructed using 64 bins in Z space and 32 bins in c space, which gives a vector 
of 2048 values to describe a single joint FDF. The conditional means of the reaction 
rate (@|Z,c) were also extracted for each sample with an identical discretisation. 
Prior to training, the sample moments were independently centered by subtracting 
the median and scaled by dividing the data by the range between the 25th and 75th 
quantiles. It is known that appropriate centring and scaling are generally beneficial 
for ML algorithms (Goodfellow et al. 2016). According to the authors this centring 
and scaling were robust to outliers. The samples from a volume V; were randomly 
split among two distinct datasets: a training dataset, D}, and a validation dataset, D?, 
comprising 5% of the total samples, as illustrated in Fig. 1. 


3.2 MILD Combustion 


The MILD combustion DNS dataset of Doan et al. (2018) was used to study the appli- 
cation of DNN for inferring subgrid FDF in MILD combustion by Chen et al. (2021). 
A cube of size Ly x Ly x L, = 10 x 10 x 10 mm was used to conduct DNS of tur- 
bulent combustion of inhomogeneous methane-air mixtures diluted with exhaust 
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Fig. 1 Illustration of data generation procedure for V5 


gases. A spatial resolution of ôx ~ 20 um obtained using 512 points distributed uni- 
formly in each direction was observed to be sufficient to resolve the turbulent and 
chemical length scales of interest as described in Doan et al. (2018). The simula- 
tion was run for 1.5 flow-through time tf, defined in Minamoto and Swaminathan 
(2015). Further detail on the DNS procedure and datasets can be found in Doan 
et al. (2018). Three cases, viz., AZ1, AZ2 and BZ1, with different mixing length 
scales and dilution levels were considered for the DNN training. The conditioning 
variables for the FDF analyses were the Bilger mixture fraction (Bilger 1976) and a 
temperature-based reaction progress variable, cr, defined as 


T= T, 


CT = me FA rA 
T,(Z) — T, 


(8) 


where T, is 1500 K and the value of burnt mixture temperature T, depends on Z and it 
can be obtained using MILD Flame Element (MIFE) laminar calculations (Minamoto 
and Swaminathan 2014). Favre-filtered fields were extracted from the DNS by apply- 
ing a low-pass box filter. For example, the Favre-filtered mixture fraction Z was 


obtained as: 
1 


P(x, t) 


jes x+3 
Z(x,t) = f p (x',t) Z (x',t) dx', (9) 


where ~ and ~ denote the Reynolds and Favre filtering respectively, p is the mixture 
density and A is the filter width. The position vectors are x and x’. The subgrid 
variance was obtained as 
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(x, t) = a oe jejž dx’. (10) 
a p(x, t) Jen 


Similarly, the Cr and A fields were calculated as above. The Z-cr joint FDF was 
then computed as 


x+4 
PE, n x,t) = ap a p(x’.t) 8[6—Z(x'.)]8[n—cr(x'.)] ax’, 
a 


where € and n were the sample-space variables of Z and cr respectively, 5[-] is 
the Dirac delta function. The discrete FDFs were obtained for a given point in a 
given DNS snapshot by binning the Z and cr samples in the corresponding filtering 
subspace with 35 non-uniform bins in Z space (clustered around the stoichiometric 
value) and 31 uniform bins in cr space. The subgrid-scale covariance, Oz-,, also 
used by the copula model, was computed as 


x+4 ie 
Ozer (x, t) = a Ja p(x’, t) [Z&, t) m Z(x, t)] (12) 


1 


x [er(x’, t) — &r(x, t)] dx’. 


The filtered scalar fields Z, cy, a. A and Sza, formed the DNN input matrix 
X. The unfiltered p, Z and cr fields were used to obtain the Favre filtered FDFs 
required for the target matrix Y. The procedure is shown schematically in Fig. 2 for 
a snapshot of case AZ1. The filtered fields are presented in 2D with the thin DNS 
grid-lines for visual clarity. The indices i, j and k pertain to the x, y and z directions 
in 3D space, respectively, and are assigned to each “LES filter cube” indicated by 
a red box in Fig.2. The total number of samples taken in each direction is Mcube- 
The effects of filter size were also investigated by considering a range of filter sizes 
relevant to typical LES. The filter sizes were normalized using the thermal thickness 
of the stoichiometric MIFE, 54 = 1.6 mm. A filter size of A = 808x corresponded 
to At = A/ôÑ = 1. The extracted matrices X and Y were flattened to be two- 
dimensional, with as many rows as the number of samples and as many columns as 
the number of features. The input matrix X had 5 columns, while the target matrix 
Y had 1085 columns, obtained from the discretisation step mentioned above. 

Centring and scaling of the input matrix X were performed as follows: each 
column vector, having n3 pe elements, was centred by subtracting its mean and scaled 
by dividing by its standard deviation. Centring and scaling were not applied to the 
output matrix Y. However, to address the issue of having unbounded values of the 
FDFs, the discrete density function values were considered. As such, every number 
in Y varied between 0 and 1, and the sum of the elements in each target row is equal 
to 1. 

Subsequent to the scaling procedures, a dimensionality reduction technique like 
Principal Component Analysis (PCA), discussed in chapter “Reduced-Order Mod- 
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Fig. 2 Schematic demonstration of the construction of the DNN input and target matrices (Chen 
et al. 2021) 


eling of Reacting Flows Using Data-Driven Approaches” was used to identify and 
remove the outliers in the training data. Two types of outliers, viz., leverage and 
orthogonal, Verdonck et al. (2009) were determined and discarded. Details about 
the identification and removal step are provided in Chen et al. (2021). Once leverage 
and orthogonal outliers were removed from the dataset, the DNN training was then 
performed on the remaining observations as discussed in the following Sect. 4.2. 


3.3 Spray Combustion 


Carrier-phase DNS (CP-DNS) data of turbulent spray flames were used to build a 
deep learning training database for mixture fraction FDF predictions. In carrier-phase 
DNS, the flow field is resolved with a point source approximation for the droplets, 
thus all relevant scales of the fluid phase are resolved except the boundary layers 
around individual particles. The governing equations of the gas phase are solved in 
the Eulerian framework and coupled with a Lagrangian solver for displacement, size, 
and temperature of the droplets. An equilibrium state of the liquid and the vapor at 
the interface was assumed. A full description of the governing equations is provided 
in Yao et al. (2020). The computational domain is a rectangular box, discretised by a 
mesh with 192 x 128 x 128 cells having dpns = 100 pm. This grid size ensured a suffi- 
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cient resolution of the small scale structures of the flow field (Pope 2000), whereas a 
finer resolution could compromise the point particle assumptions of the liquid phase. 
Kerosene droplets (treated as single-component C,H 3) were randomly injected into 
humid air, representative of experimental (Khan et al. 2007; Wang et al. 2018) and 
numerical (Wright et al. 2005; Giusti et al. 2018) setups. A homogeneous isotropic 
turbulent velocity field, calculated by a modified von Karman spectrum (Wang et al. 
2019) was imposed at the inlet. The progressive kerosene droplet evaporation led to 
an ignitable mixture that promoted a statistically planar turbulent partially premixed 
flame. Further downstream, the hot post-flame temperatures led to reduced turbu- 
lence levels due to higher viscosity and a sudden evaporation of remaining droplets 
that could penetrate the flame. This lack of homogeneity and the presence of a source 
term for the mixture fraction are prone to make the existing FDF models (O’Brien 
and Jiang 1991; Cook and Riley 1994) inaccurate. 

Filter boxes were used for post-processing of CP-DNS data to group several 
DNS cells into one LES cell. A filter box example is shown in Fig. 3 along with the 
DNS domain and setup, and the simulated temperature contour. The mixture fraction 
FDF P(n) was computed from DNS data using a mixture fraction binning, with a 
bin size of 0.01 for all DNS cells lying within a specific LES cell. Favre filtering 
was used to extract LES quantities that were employed as input variables for the 
ANN. According to Klimenko and Bilger (1999), the following input quantities were 
found to have an effect on the mixing statistics and were thus considered: mixture 
fraction £, eddy viscosity v,, turbulence dissipation rate €,, diffusion coefficient D, 
density p, spray evaporation rate Jm, relative velocity between the droplet and the 
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Fig.3 Simulation setup of CP-DNS (solid points: droplets; the gas phase is colored by temperature) 
and an LES filter box (Yao et al. 2020) 
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surrounding gas U4 and droplet number density C. The turbulence dissipation rate 
was replaced by the more easily available strain rate |S;;|. All the DNN inputs were 
filtered and Favre averaged. Therefore, the input features are commonly accessible 
in atypical LES of spray combustion. Moreover, Wang et al. concluded in their study 
that these parameters sufficiently characterize the mixture fraction FDF in turbulent 
spray flames. To ensure the reliability of the DNN for a reasonable range of LES 
meshes, the authors investigated the following LES filter sizes: (Ags)? = (85pns)° : 
(Ares)? = (165pns)?, (Ares)? = (325pns)°. The final database is a combination of 
data samples with different Arps. The performance of the DNN for data samples 
using different LES filter boxes were assessed. The output target was set to be a 
placeholder of 60 elements covering £ in [0, 0.6], as Emax < 0.6 in the the spray flame 
simulations. To avoid that the binning procedure could lead to empty bins, especially 
for small A;gs, missing values were replaced by interpolated values computed by 
Stineman interpolation method, which is widely used in statistics to deal with the 
missing values as it preserves the monotonicity of data and prevents introducing 
spurious oscillations (Stineman 1980). It was found that the commonly used zero- 
padding operation, which fills in blank data with zeros, is not applicable as the DNN 
would be misled and learn erroneous patterns. A total of 18 simulation cases were 
run to form the full database for training and validation purposes. The validation 
(test) dataset consisted of five simulation cases, resulting in a test/train ratio of about 
0.38. These datasets included parameter ranges that approximate conditions to be 
expected in real spray flames and were used for the a priori validation presented in 
Sect. 5. 

To recap, the three studies selected several DNS cases to construct a heterogeneous 
training set. If only one DNS case was available then several subdomains within the 
DNS domain were selected. Chen et al. (2021) considered one additional DNN input 
feature, i.e., the scalar covariance, to the input set chosen by de Frahan et al. (2019). 
Yao et al. (2020) chose different DNN input features specifically for spray combus- 
tion. No scaling was adopted by Yao et al., whereas two different scaling methods 
were implemented in the other studies. Only Chen et al. adopted an outlier removal 
by using a dimensional reduction technique. Discrete density functions, bounded 
between 0 and 1, were the DNN target in de Frahan et al. (2019) and Chen et al. 
(2021) while Yao et al. (2020) considered probability density function values. The 
review of these studies shows that no unique algorithm needs to be adopted to prepare 
the input data for a ML model. The only common goal that needs to be pursued is to 
construct an input dataset that is as heterogeneous as possible to increase the gener- 
alisation, also known as transfer learning, of the trained ML models. The similarities 
and differences of the DNNs used in these three studies are discussed next. 
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4 Deep Neural Networks for Subgrid-Scale FDFs 


A standard neural network consists of many simple connected functional units, called 
neurons. Each neuron receives an input which is processed through activation func- 
tions to produce an output. Multiple neurons can be combined to form fully connected 
networks, which are called artificial neural networks (ANNs) since they mimic the 
neuron arrangements in the human brain. Feed-forward networks, also called multi- 
layer perceptrons (MLPs), are classic ANN structures, and they are composed of 
layers of neurons, where a weighted output from one layer is the input to the next 
layer. The first layer of the MLP accepts a vector as input and the elements of this 
vector are known as features. The final output of the MLP is the target quantity of 
interest. The layer providing the final MLP output is called output layer, while the 
other layers in the network are called hidden layers. In a mathematical perspective 
(Goodfellow et al. 2016), the MLP defines a mapping from the input x to the output 
y = f(x,6), where the parameters @ are the trainable network parameters. Each 
neuron is a functional unit that is generally described by 


y = o(x’w +b), (13) 


where œ and b are the weights and bias vector, and ¢ is the activation function 
(see Sect. 2.3.7.2, Chap. 2, this volume), which provides great flexibility to ANNs 
by introducing non-linearity to an otherwise linear relationship between input and 
output. There are several activation functions and some of these will be introduced 
and described later. The weight œ is a matrix of the size k x m, whereas the bias b is 
a vector of m elements. For each layer, k is the number of inputs received from the 
preceding layer and m is the number of neurons in the current layer. œ and b contains 
the trainable parameters of the network. The training of ANNs pursues the objective 
of minimizing a target loss function 


L(x, @) = G(f(x,@) — f”), (14) 


where ¥ is any measure of the difference between the modeled value f and the 
real value f*. The most commonly used loss functions are the mean absolute error 
(MAE) and the mean squared error (MSE). Nonlinear optimization methods, such 
as backward propagation (Rumelhart et al. 1986), are used to identify the network 
weights that minimize the error between predictions and labeled training data. The 
training step gives the optimized set of weights. The MLP is a design that is suitable 
for regression problems, whereas other types of ANNs, such as Convolutional Neural 
Network (CNN) and Recurrent Neural Network (RNN), have been extensively used 
in processing image data and time series problems, etc., see Sect. 2.3.7.2 (Chap. 2, 
this volume), for further detail. A schematic of the MLP architecture with input, 
hidden, and output layers is shown in Fig. 4 as an example. 
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Fig. 4 A schematic of 
3-layer MLP architecture 


4.1 Low-Swirl Premixed Flame 


A feed-forward fully connected DNN with three, two hidden and an output, layers 
was trained by de Frahan et al. (2019) to predict the joint subfilter FDF of mixture 
fraction and progress variable. There were 256 and 512 neurons in the two hidden 
layers and neurons had a leaky rectified linear unit activation function (LeakyReLU): 


Xj if xj = 0 
yi = . (15) 
ax; otherwise 


where x; is the weighted sum of the neuron input, y; is its output, and œ, usually 
equal to 0.01, is the slope. A LeakyReLU activation function avoids mapping neg- 
ative input to zero values unlike its parent function ReLU having a = 0. A large 
weight update during training can yield the summed input to be always negative 
regardless of the network input. A neuron featuring a ReLU function will then out- 
put a zero value leading to the dying ReLU case, in which the neuron neither activates 
a gradient-based optimization nor adjust its weights. Furthermore, similar to the van- 
ishing gradients problem, the learning can be slow while training ReLU networks 
stumbling on constant zero gradients. The leaky rectifier allows for a small, non-zero 
gradient when the unit is saturated and not active. Additionally, each hidden layer is 
followed by a batch normalization layer (Ioffe and Szegedy 2015) and this technique 
has been widely used to build deep networks as it leads to speed and performance 
improvements. It applies the following function: 


Xi — Ux 


y= RT Tg 


+ô (16) 


224 S. Iavarone et al. 


where x; and y; are the i-th elements of the layer input and output vectors respectively. 
These vectors are of size n having a mean and variance of u, = 1/n }-;—; x; and 
o? = l/n Yia — py)’. A small real number € is used to maintain numerical 
stability. Both y and ô are learning parameter vectors of size n and they are updated 
iteratively during training for optimization purposes. de Frahan et al. (2019) chose 
€ = 10%% and a moving average of jz, and oy computed during training with a decay 
of 0.1 (or, equivalently, momentum of 0.9). oe 

The DNN inputs are the four moments of the joint FDF, viz., Z ; o2, č, and o2 
whereas the outputs are a total of 2048 FDF values obtained from the discretisation of 
the joint FDF of mixture fraction Z and progress variable c as described in Sect. 3.1. 
Thus, an output layer having 2048 neurons, as many as the number of outputs, was 
considered in de Frahan et al. (2019). The output layer features a softmax activation 
function: 

exp(x;) 


= ST ewan e 


where x; and y; are defined as for Eq. 16. This type of activation function ensures 
that }~"_, y; = 1 and y; € [0, 1] Vi. The loss function used was the binary cross 
entropy between the target y and the prediction y and this function is 


n 


: 1 : F 
LG, y) = =) (yilog 5: + (1 — y) log (1 — $i)) (18) 


i=1 


representing a proper metric for measuring the difference between two probability 
distributions. The total number of trainable parameters was 1.1 M. The training 
was performed over 500 epochs, i.e., 500 training loops through the entire training 
data. For each epoch, the training data is fully shuffled and divided into batches 
with 64 training samples per batch. All trainable parameters are updated after each 
epoch. A split of 95/5% between training and validation samples was applied on 
the entire dataset. The loss function is computed on the validation samples which 
are not part of the training process. Thus, the validation loss is the true indicator of 
the ANN’s performance and provides hints regarding its generality. It is a common 
practice to track the losses during both training and validation steps continuously 
to check if the losses are decreasing over each epoch by studying learning curves 
(a plot of loss vs epoch number). These learning curves can be used to diagnose an 
underfit, overfit, or well-fit model and whether the training or validation datasets are 
not representative of the problem domain. A good ANN training gives loss curves 
that decreases continuously until a plateau is reached where the difference between 
the training and validation losses is small. de Frahan et al. (2019) chose Adam 
optimizer (Kingma and Ba 2014), which is a gradient descent algorithm, with an 
initial learning rate of 1074. The learning rate is a dimensionless parameter that 
determines the step size of the stochastic gradient descent used to adjust the weights, 
@. The Adam optimizer is more sophisticated than traditional stochastic gradient 
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descent by having a per-parameter learning rate, which can also be adapted during 
the training (Kingma and Ba 2014). 


4.2 MILD Combustion 


Chen et al. (2021) used a feed-forward fully connected DNN to infer the joint FDF 
of mixture fraction and progress variable. This DNN is similar to the one employed 
by de Frahan et al. (2019) and can be summarized as follows: 


linear hidden layer with 5 input features and bias, LeakyReLU activation function 
with œ = 0.01, and 256 output features; 

batch normalization layer with 256 input and output features, and momentum 
equal to 0.9; 

linear hidden layer with 256 input features and bias, LeakyReLU activation func- 
tion with œ = 0.01, and 512 output features; 

batch normalization layer with 512 input and output features, and momentum 
equal to 0.9; 

linear output layer with 512 input features and bias, softmax activation function, 
and 1085 output features. 


Thus, the two hidden layers had 256 and 512 fully connected neurons, where 
LeakyReLU activation functions were applied. Each hidden layer was followed by a 
batch normalization layer. The output layer contained 1085 neurons featuring a soft- 
max activation function. The loss function used was the binary cross entropy given 
in Eq. 18 along with Adam optimizer with an initial learning rate of 1074. The model 
was trained for maximum 1000 epochs with batch size of 256 training samples. The 
ANN features were the four moments of the joint FDF and the outputs were a total 
of 1085 FDF values. A split of 80/20% between training and validation samples was 
applied on the entire dataset containing about 28000 filtered DNS boxes. An early 
stopping method, by using a predefined number of epochs, was used for the training 
to avoid overfitting. An overfitted ANN will have a validation loss that decreases for 
the first several epochs but increases subsequently (Goodfellow et al. 2016). 


4.3 Spray Flame 


Yao et al. (2020) used an MLP with four hidden layers and 500 neurons per layer 
to infer the Favre-filtered FDF of the mixture fraction in spray flames. As noted 
in Sect.3.3, the input quantities were 3 Ves Sal. D, P, Spray evaporation rate Ins 
relative velocity between the droplet and the surrounding gas Ua, and droplet number 
density C. The output was a vector with 60 elements since the FDF of the mixture 
fraction P(7) (where n is the sample space variable for the mixture fraction £) was 
obtained as described in Sect. 3.3. The activation function ¢(z) = max(0, z) applied 
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in each layer was the ReLU. A traditional stochastic gradient descent algorithm was 
used to minimize the mean absolute error, which was the loss function. A total of 18 
DNS cases were run to form the full datasets for the training and validation steps. 
The validation (test) dataset consisted of five cases, resulting in a test/train ratio 
of ~0.38. An early stopping criterion was imposed for the training process. This 
ANN was also trained on the conditional scalar dissipation rate (N|& = n), which is 
another interesting application. 


5 Main Results 


5.1 FDF Predictions and Generalisation 


An overview of the ML model performance in each of the test cases is discussed 
in this section. The FDF predictions provided by ML and analytical models were 
assessed a priori using the FDFs obtained from the DNS cases. 


5.1.1 Premixed Flame 


Three different ML models, i.e., random forest (RF), conditional variational autoen- 
coder (CVAE), and DNN, were trained by de Frahan and coworkers using filtered 
DNS data from the subvolume V3 of the low-swirl premixed flame, i.e., the algo- 
rithms were trained on D4, and the metrics were evaluated on D} (see Fig. 1). Figure 5 
compares the marginal FDFs P(Z) and P(c) obtained using the three ML models, 
6-function model and DNS result for V3 for three different values (low, medium, 
and high) of the Jensen-Shannon divergence (JSD), which measures the similarity 
of two probability distributions, Q; = OPNS(n) and Q, = O™!(n). The JSD is 
given by 


a oin) Q2(n) 
J(Qil| 2) = 5 {au menj, Qan) m| £0) (19) 


n=1 


The JSD divergence is symmetric, i.e., J(Qi||Q2) = J(Q2||Q1), and mathemati- 
cally bounded between 0 and In(2), with 0 indicating Qı = Q2. The JSD for the three 
samples shown in Fig. 5 were computed by considering the FDFs extracted from the 
DNS of the premixed flame and those obtained by the 6 — p analytical model. It can 
be seen from Fig. 5 that the 8 — £ analytical model is unable to capture more complex 
FDF shapes, such as bimodal distributions, as also confirmed by high JSD values. 
Thus, the need for more accurate models is motivated. Accurate predictions can 
be expected for J(P|| Pi) < 0.3, whereas predictions with J(P|| Pn) > 0.6 exhibit 
incorrect median values and overall shapes. 
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Fig.5 Marginal FDF for low, mid-range, and high Jensen-Shannon divergence values for the 6 — B 
PDF model. Red solid line is for RF model, green dashed line is for DNN model, blue dash-dotted 
line is for CVAE model, orange short dashed line is for 6 — 6 model and black solid line is the 
DNS result (de Frahan et al. 2019) 


228 S. Iavarone et al. 


The abilities of the three ML models to infer the subgrid FDF in regions other 
than D§ was also assessed because DNS results showed that the FDF in downstream 
locations were significantly different from those for V3. So, the ML models were 
trained using (1) D4 data (volume centered at z = 0.0775 m), (2) data from D5 
(volume centered at z = 0.1025 m) and (3) data collected from the odd-numbered 
volumes D! = Uj=1,3,5,7,9D} . The training data in the last case were representative 
of the entire computational domain. It was found that the models trained using data 
from a single volume were unable to infer the FDF in other volumes which was 
indicated by the high 90th percentile (Joo) of all the Jensen—Shannon divergences 
errors. The ML models trained using the odd-numbered volumes (test 3 above) gave 
Jog < 0.2 for the entire physical domain although only 4% of the DNS data from the 
entire computational domain was used for the training. Among the three ML modes, 
DNN yielded the lowest errors. The analytical 6 — 6 model had Jog values which 
were almost twice of that for the ML models. The sample marginal FDFs of mixture 
fraction and progress variable for 3 different values of Jensen-Shannon divergences 
computed for the DNN model are shown in Fig.6 and it is clear that the bimodal 
distributions are also captured quite well by the ML models. 

Another generalisation test was conducted by using validation data generated from 
a different time snapshot of the DNS (t = 0.059 s). For this case, the DNN model 
trained on D! = Uj=1,3,5,7,9D} provided reasonable Jog values, although slightly 
higher than those obtained for the validation data from the same time snapshot of 
the training data. The 6 — 6 model provided similar errors in both cases but three 
times higher than those of the DNN model. These generalisation tests demonstrated 
that the learned models are able to generalize temporally, as well as spatially. The 
results reported in this subsection suggest that it is important to use the training data 
covering the expected range of physical processes for which the ML is to be applied. 


5.1.2 MILD Combustion 


For the MILD combustion cases, the FDFs provided by DNN, $ — $ and copula 
models are presented and compared to the DNS FDFs in Figs.7, 8 and 9 for cases 
AZ1, AZ2 and BZ1 respectively. The DNN model significantly outperforms both 
analytical models and its prediction agrees very well with the DNS data for the 
different cases. As a general observation, the DNN captures the non-regular shapes 
of the marginal FDF of the progress variable quite well where the analytical models 
given by the £ function and copula give Gaussian-like distributions. This difference 
has important implications for the reaction rate modelling as one shall see later in 
Sect.5.2. For the mixture fraction, however, all models give good results but only 
the DNN is able to capture the asymmetry of the FDF which can be seen clearly 
in Fig. 9b and 9d for case BZ1. These results indicate promising capabilities of the 
DNN to predict the complex subgrid scalar statistics in MILD combustion. 

It was noted by Chen et al. (2021) that the FDFs extracted directly using the 
instantaneous snapshots of DNS are random variables containing subgrid statistical 
information, as also pointed out in Pitsch (2006) and Pope (1985). The instantaneous 
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Fig. 6 Marginal FDF for median and high Jensen-Shannon divergence values for models trained 
on D' = Uj=1,3,5,7,9D}. Red solid line is for RF, green dashed line is for DNN, blue dash-dotted 
line is for CVAE, orange short dashed line is for 8 — 6 model, and black solid line is for DNS (de 
Frahan et al. 2019) 
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Fig. 8 Case AZ2: comparison of joint and marginal FDFs from DNS and models for filter sizes of 
A+ = 0.5 (Chen et al. 2021) 


1 1 
(a) 8000 
6000 
50.5 0.5 
4000 
2000 
0 0 0 
0 0.005 0.01 0 0.005 0.01 0 
Z Z 
(b) 1500 
_~ 1000 
a 
à 
~= 500 
0 
(c) ! 1 5000 
4000 
A 
K 0.5 05 NM 3000 
AN 
Cy 2000 
p On 1000 
0 0 0 0 
0 0.005 0.01 0 0.005 0.01 0 0.005 0.01 0 0.005 0.01 
Z Z 
(d) 600 
_— 400 
S) 
A 


0.005 0.01 
Z cr 


Fig. 9 Case BZ1: comparison of joint and marginal FDFs from DNS and models for filter sizes of 
AŤ = 0.5 in (a) and (b), and At = 1.0 in (c) and (d) (Chen et al. 2021) 
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FDFs present certain levels of randomness due to the unsteady nature of single 
realisations. This randomness is removed to a good extent if the training data for ML 
are selected over many DNS realisations at a statistically stationary state. Therefore, 
following several experimental studies (Wang et al. 2007; Tong 2001; Cai et al. 2009), 
the instantaneous FDFs obtained from the DNS were conditioned on the resolved 
scalars, Z and ¢y, and then ensemble-averaged. A quantitative comparison of the 
conditionally averaged FDFs was then performed. Two variables, Z and cy, were 
considered as the number of available DNS samples was not sufficient to perform 
a Statistically meaningful averaging on the four statistical moments used as ANN 
inputs. The resolved mixture fraction and progress variable were chosen so that the 
selected samples were located in the reaction zone (ĉy œ~ 0.5). Figures 10 and 11 
show the conditional FDFs, (P (Z, cr) | Z A cr), for cases AZ1 and BZ1 respectively 
and the values of the conditioning variables are given in the figure captions. The DNN 
accurately reproduces the conditional joint and both marginal FDFs. It also captures 
the significant changes in the FDF shape with the varying filter size, especially for 
the progress variable. For case AZ1, both the 6 and copula models overpredict the 
peak when A* < 1 for both Z and cr distributions. However, for At = 1.5, the 
overall prediction is good for P (Z) and the peak of P (cr) is also close to the DNS 
value although the shape is not captured. Similar results were reported for cases AZ2 
also. For case BZ1, the mixture fraction distribution is predicted fairly well by all 
models for different At values. However, both analytical models fail to predict the 
bimodal-plateau shape of P (cr), which is typical of MILD combustion but seen 
seldom in conventional flames. 

The JSD values were also calculated using Eq. (19), for the DNN and the two 
analytical models which confirmed the observations made using Figs. 7, 8, 9, 10 and 
11. The JSD values provided by the DNN were much lower than those for the 6 and 
copula models. Improved predictions and lower JSD values were observed for all the 
models by increasing the filter size and this improvement was particularly significant 
for the DNN having Joo < 0.05. The DNN model performed equally well for Z 
and cr. 

To check for generalisation capability, the DNN was further validated using data 
which were not included in the learning/training step. The training and validation 
datasets included snapshots taken from t = Tẹ to 1.277, where tẹ is the flow-through 
time, but the test data were taken using snapshots taken between 1.4tf and 1.5rT,. 
Substantial variations in the MILD combustion behaviour were observed among 
these snapshots (see Doan et al. 2018 for details). Hence, a robustly trained DNN 
is attractive if it can accurately infer a quantity of interest (here, FDF) for scenarios 
that have not been explicitly seen during the training process. The PDFs of the JSD 
values for the self-predictions (i.e., predictions performed on the training datasets) 
and unknown-predictions of the FDF are shown in Fig. 12. A filter size of At = 1 
was used for all cases. As indicated in Fig. 12, the DNN provides a similar level of 
accuracy when unseen test data points are fed to the model. More than 80% of the 
JSD values are smaller than 0.05. The advantage of using DNN as FDF model is 
still unaffected since the majority of JSD values were larger than 0.1 for the 6 and 
copula FDF models. A slightly worse performance was achieved by the DNN when 
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Fig. 11 Case BZ1: comparison of joint and marginal FDFs from DNS and models for a and b 
At = 0.5, Z = 0.00034, ĉr = 0.48; and c and d At = 1, Z = 0.0036, ĉr = 0.46 (Chen et al. 
2021) 


the training data came from cases AZ1 and BZ1, and the validation was done on 
case AZ2. The JSD results obtained from this new test with the self-predictions for 
A* = 0.5 indicated that the overall performance was still good although the JSD 
distribution shifted towards higher JSD values. Further improvement on predictions 
is expected to be achieved if more datasets with different scenarios are included in 
the training. 


5.1.3 Spray Flame 


Yao et al. (2020) visually compared the FDF predicted by ANN and £-function 
model with the DNS values for one of the validation cases (CX1). Moreover, the 
data samples of this case were divided into three different groups characterized by 
filter size Ags, to compute the sensitivity of the trained ANN model to LES grid 
sizes. The LES cells were selected randomly for a given E ranging from fuel-lean to 
fuel-rich conditions. The stoichiometric mixture fraction value is Es = 0.068. 
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Fig. 12 Comparison of Jansen-Shannon divergence for DNN self- and unknown-predictions of 
FDF of a progress variable and b mixture fraction. The filter size for all cases is AT = 1.0 (Chen 
et al. 2021) 


Figure 13 compares FDF computed using ANN and -function with DNS results 
for two filtered mixture fraction values and three Ay gs. There is no marked differ- 
ences in the ANN prediction for different Ags. The ANN predictions of P (ņ) are 
in excellent agreement with the DNS results, including the peak value and its loca- 
tion. The FDF is skewed towards the lean side (yn < &,,) for E = 0.05 whereas it is 
stretched towards the rich side for È = 0.10, and even a bimodal behaviour appears 
at larger filter sizes. The -function does not seem to represent the FDFs well and 
numerical issues can arise when the mean is close to zero or unity with small SGS 
variance (Kronenburg et al. 2000). 


5.2 Reaction Rate Predictions 


The filtered reaction rate inferred by the ML models were also assessed against DNS 
results by de Frahan et al. (2019) for their premixed flame and by Chen et al. (2021) 
for the MILD combustion cases. The ML models used by de Frahan et al. inferred 
the unconditional filtered reaction rates @, which are computed according to Eq. 6, 
and are shown in Fig. 14. Significant over predictions were observed for the 6 — 6 
model. The comparisons of the conditional reaction rates are also shown in Fig. 14. 

The reaction rate in the transport equation for the filtered temperature-based 
progress variable, @,,, can be computed using 


1 1 
Bc. =f f EA RE dee, (20) 
0 0 


236 S. Iavarone et al. 


3 3 f ip- 3 3 > 3 
(âLes)” = (8Apns) (Ares)” = (16Apns) (Ares)* = (32Apns) 
e 
DNS DNS e DNS 
ANN e ANN ANN 
h B-FOF . 6-FOF ‘ B-FOF 
8 1i s m g ' 
pa] oe mn i 
© æi & Pie g ' 
f= Ai ps fi t ” ' 
7 Be) f Be! |} é 
i 2 i \ à 
tap ts ie g ` 
je \ fed 
el ft \ e] f \ te 
ug o » i à r \ 
fi et | 4 e4 ff X, 
fi DN j 1% di 
N ` te, $ A z 4 +, 
o4% ST eteetececessecescocenoeses odo Sasesessecessereesereeseoe o1¢- emvcsenensensocencoseseones 
0.00 005 0.10 015 020 025 030 035 0.00 005 010 015 020 025 030 035 000 005 010 015 020 025 030 035 
o DNS ONS DNS 
kad ANN ANN ANN 
B-FOF e P-FDF R B-FDF 
> 2 
: e 
S . 2 . 
Eel 4 : bla ii 
up Y i 
‘ * ` 
EF te a pb? % 4.99% + ` 
bas à DN o A ttes 
: 
oe Lae 
b tee | d ` ee aa ay 
+ ` ` d Re , s ÉZ 
o {eb-- ~ „€ Se totetseoee oie-- ~ ~ Teh eserescece oig-- S.a- NO 


0.00 0.05 0.10 0.15 020 025 030 0.35 0.00 0.05 0.10 015 020 025 030 035 0.00 005 0.10 015 020 025 030 035 


Fig. 13 Validation of ANN predictions of P (n) with DNS results for different LES grid sizes. The 
results are shown for £ = 0.05 (top) and £ = 0.1 (bottom) (Yao et al. 2020) 


where the joint FDF P (Z, cr) is obtained through the ANN in the MILD combustion 
cases investigated by Chen et al. (2021). The symbol (@,,(x,1t)) = 
(© (x, t)/p(x, t)|Z, cr) is defined as the doubly conditional mean reaction rate 
obtained from the DNS data. The instantaneous reaction rate of cr is defined as 
cp = q /[cp(Tp — T,)], with g and c, being the volumetric heat release rate and 
specific heat capacity of the mixture respectively. The conditional averages are com- 
puted using samples collected over the entire computational domain, see Sect. 3.2, 
and all the snapshots available (~ 60) to achieve good statistical convergence. The 
authors verified that the doubly conditional mean rates have negligible variations in 
time and space, supporting the assumption of many turbulent combustion models 
(viz., flamelets, see Bradley et al. 1990; Fiorina et al. 2003; Pierce and Moin 2004; 
van Oijen et al. 2016; and conditional moment-based methods, see Klimenko and 
Bilger 1999; Steiner and Bushe 2001) that the conditional means have small tempo- 
ral and spatial variations if appropriate conditioning variables are used. The target 


f —m—DNS f ips 
filtered reaction rate D, was obtained by computing both the conditional mean 


reaction rate and the FDF in Eq. 20 directly from the DNS data. The scatter plots 
of teased and the reaction rates computed using FDFs obtained through £, copula 
and DNN models are presented in Fig. 15 for one of the DNS cases (AZ1) investi- 
gated in Chen et al. (2021). The qualitative behaviours and the trends were found 
to be similar for the other two cases. Although all models give reasonable predic- 
tions, the DNN outperforms the analytical models for all filter sizes. Moreover, the 
DNN predictions generally exhibit good symmetry about the diagonal, indicating 


a bias towards neither under- nor over-prediction, while the scatters for both the 6 
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Fig. 14 Reaction rate @ inferred by the ML models trained on D! = Uj=1,3,5,7,9D}. Red squares 
and solid line are for RF model, green diamonds and dashed line are for DNN, blue circles and 
dash-dotted line are for CVAE, orange pentagons and short dashed line are for 6 — 6 model, and 
black solid line is for DNS result (de Frahan et al. 2019) 


and copula models are asymmetric. As At increases, the DNN prediction improves 
considerably whereas the performance of the analytical models does not follow this 
trend with the filter size. For both the 6 and copula models, a trend in the off-diagonal 
samples moving from under-predictions at small At to over-predictions at larger At 
can be seen. 


6 Conclusions and Prospects 


The application of ML algorithms to infer subgrid-scale filtered density functions 
(FDFs) in three test cases, i.e., swirling premixed flame, MILD and spray combus- 
tion, have been discussed in this chapter. Particularly, the promising results provided 
by deep neural networks (DNNs) for accurately inferring the FDFs have been shown. 
DNNs are generally able to capture the complex FDF behaviours and their variations 
with great accuracy across various combustion scenarios, turbulent and thermochem- 
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(denoted using different markers) for case AZ1. The results for different filter sizes are also 
shown (Chen et al. 2021) 


ical conditions, and LES filter sizes. This can be achieved by manipulating the input 
data (extracted from DNS of these three cases), changing the network architecture, 
and tuning the network hyperparameters (e.g., learning rate, batch size). It has been 
shown that if the DNN training dataset is heterogeneous, i.e., it contains different 
possible outcomes of the quantities of interest, the DNN can handle unknown inputs 
quite well, suggesting a good model robustness. Thus, the DNN can be applied as a 
black-box model to other cases. By contrast, analytical models such as the 6-function 
and copula models in most cases show their limitations quite clearly. 

Although the above observations demonstrate the potential of DNN-based FDF 
modelling in combustion, several challenges remain and require further investiga- 
tions. Searching for an optimal combination of the DNN hyperparameters can be 
highly time-consuming and computationally expensive. For example, an exhaustive 
grid search, looping through all combinations of layers and neurons to find an opti- 
mum, is not an easy task and may require cloud computing services (Yao et al. 2020). 
Moreover, due to the black-box nature of ML models, it is often hard to debug them 
to a satisfying level or improve them substantially after such a level is reached. This 
shifts the attention to the preprocessing of training data, which can be a daunting 
and time-consuming task, as mentioned in Chen et al. (2021). The lack of physical 
constraints in the training of ML models is yet another issue, and research is ongoing 
to develop physics-informed ML models that can respect physical laws and increase 
the interpretability and generalisation capability of ML models. 

If DNNSs are to replace combustion models, the overhead of retrieving predictions 
can also be of concern and counterbalance the observed savings in storage require- 
ment. The overhead associated with the use of DNNs is highly machine-dependent 
and also network size-dependent. A posteriori LES studies need to quantify the com- 
putational times required by the DNN inference of FDFs and mean reaction rates. 
High inference times could hinder the development of in-situ capabilities, where the 
ML model is trained during the simulation, which can mitigate the risk of extrapola- 
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tion. The latter can be reduced by also combining ML training and applications with 
uncertainty quantification or sensitivity analysis approaches that can effectively ver- 
ify the performance of the ML model, provide a level of confidence in its predictions, 
guarantee that it does not violate physics laws and promote its more comprehensive 
application. 

Machine Learning has induced notable advancements in combustion science. It 
has been effectively used for finding hidden patterns under large amounts of data, 
exploring and visualising high-dimensional input spaces, deriving complex map- 
ping from inputs and outputs, and reducing computational cost and memory occupa- 
tion (Zhou et al. 2022). However, many challenges and hence research opportunities 
are left to be addressed, and the development of physics-based ML approaches is 
just the starting point of a scientific paradigm shift that will bring new insights in 
combustion science with the help of big data. The combination of ML and com- 
bustion will provide solutions to daunting problems and enhance the understanding 
and deployment of novel combustion processes and technologies, which will shape 
a cleaner and sustainable future energy arena. 
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Abstract Data-driven modeling of complex dynamical systems is becoming increas- 
ingly popular across various domains of science and engineering. This is thanks to 
advances in numerical computing, which provides high fidelity data, and to algo- 
rithm development in data science and machine learning. Simulations of multicom- 
ponent reacting flows can particularly profit from data-based reduced-order modeling 
(ROM). The original system of coupled partial differential equations that describes a 
reacting flow is often large due to high number of chemical species involved. While 
the datasets from reacting flow simulation have high state-space dimensionality, they 
also exhibit attracting low-dimensional manifolds (LDMs). Data-driven approaches 
can be used to obtain and parameterize these LDMs. Evolving the reacting system 
using a smaller number of parameters can yield substantial model reduction and 
savings in computational cost. In this chapter, we review recent advances in ROM of 
turbulent reacting flows. We demonstrate the entire ROM workflow with a particular 
focus on obtaining the training datasets and data science and machine learning tech- 
niques such as dimensionality reduction and nonlinear regression. We present recent 
results from ROM-based simulations of experimentally measured Sandia flames D 
and F. We also delineate a few remaining challenges and possible future directions 
to address them. This chapter is accompanied by illustrative examples using the 
recently developed Python software, PCA£01d. The software can be used to obtain, 
analyze and improve low-dimensional data representations. The examples provided 
herein can be helpful to students and researchers learning to apply dimensional- 
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ity reduction, manifold approaches and nonlinear regression to their problems. The 
Jupyter notebook with the examples shown in this chapter can be found on GitHub at 
https://github.com/kamilazdybal/ROM-of-reacting-flows- 
Springer. 


1 Introduction 


There is growing interest and numerous recent developments in reduced-order mod- 
eling (ROM) of complex dynamical systems (Kutz et al. 2016; Taira et al. 2017; Lusch 
et al. 2018; Mendez et al. 2019; Raissi et al. 2019; Dalakoti et al. 2020; Ramezanian 
et al. 2021; Han et al. 2022; Zhou et al. 2022). While these systems can be character- 
ized by a large number of degrees of freedom, they often exhibit low-rank structures 
(Maas and Pope 1992; Holmes et al. 1997; Pope 2013; Yang et al. 2013; Mendez et al. 
2018). Describing the evolution of those structures provides a powerful modeling 
approach with substantial reduction to the number of partial differential equations 
(PDEs) solved in computational simulations (Sutherland and Parente 2009; Biglari 
and Sutherland 2015; Echekki and Mirgolbabaei 2015; Owoyele and Echekki 2017; 
Malik et al. 2018, 2020). 

Reacting flow simulations can profit from model reduction due to initially high 
state-space dimensionality stemming from large chemical mechanisms. Reacting 
systems can often be effectively re-parameterized with much fewer variables. Numer- 
ous physics-based parameterization techniques can be found in the combustion lit- 
erature (Maas and Pope 1992; Van Oijen and De Goey 2002; Jha and Groth 2012; 
Gicquel et al. 2000). An alternative to the physics-motivated parameterization is a 
data-driven approach, where low-dimensional manifolds (LDMs) are constructed 
directly from the training data (Sutherland and Parente 2009; Yang et al. 2013). In 
particular, dimensionality reduction techniques can be used to define LDMs in the 
original thermo-chemical state-space. Among many available linear and nonlinear 
techniques, principal component analysis (PCA) (Jolliffe 2002) is commonly used 
in combustion to obtain a linear mapping between the original variables and the 
LDM (Sutherland and Parente 2009; Mirgolbabaei and Echekki 2013; Echekki and 
Mirgolbabaei 2015; Isaac et al. 2015; Biglari and Sutherland 2015). In PCA, the 
new parameterizing variables, called principal components (PCs), can be obtained 
by projecting the training data onto a newly identified basis. A small number of the 
first few PCs defines the LDM. ROMs can then be built based on this new parameter- 
ization. As one example of ROM, PDEs describing the first few PCs can be evolved 
in combustion simulations (Sutherland and Parente 2009) which result in a substan- 
tial reduction of computational costs as compared to transporting the original state 
variables. 

Often, ROM workflows incorporate nonlinear regression to bypass the recon- 
struction errors associated with an inverse basis transformation. Regression can thus 
provide an effective route back from the reduced space to the original state-space 
where the thermo-chemical quantities of interest such as temperature, pressure and 
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composition, can be retrieved. Regression models can also provide closure for any 
non-conserved manifold parameters. Nonlinear regression techniques such as artifi- 
cial neural network (ANN) (Mirgolbabaei and Echekki 2014; Dalakoti et al. 2020), 
multivariate adaptive regression splines (MARS) (Biglari and Sutherland 2015) or 
Gaussian process regression (GPR) (Isaac et al. 2015; Malik et al. 2018, 2020) were 
used in the past in the context of ROM. 

In this chapter, we present the complete ROM workflow for application in react- 
ing flow simulations. We begin with a concise mathematical description of a general 
multicomponent reacting flow. Understanding the governing equations of the ana- 
lyzed system is a crucial starting point for applying data science tools on the resulting 
thermo-chemical state vector. After a discussion of training datasets, we present the 
derivation of the ROM in the context of reacting flows. We review the combina- 
tion of dimensionality reduction techniques with nonlinear regression. We discuss 
three popular choices for nonlinear regression: ANNs, GPR and kernel regression. 
Finally, we review recent results from a priori and a posteriori ROM of challenging 
combustion simulations. 

Throughout this chapter, we delineate a few outstanding challenges that remain 
in ROM of combustion processes. For instance, projecting the data onto a lower- 
dimensional basis, as is done in many ROMs, can introduce undesired behaviors 
on LDMs. Observations that are distant in the original space can be collapsed into a 
single, overlapping region. In the overlapping region, those observations are indistin- 
guishable and the projection can become multi-valued. When the identified manifold 
is used as regressor, these topological behaviors on LDMs can make the regression 
process more difficult. Ideally, we would like to search for such parameters defining 
the LDM, that the resulting regression function uniquely represents all dependent 
variables. Recent work by Zhang et al. (2020) has demonstrated that regressing vari- 
ables that have significant spatial gradients can be challenging using ANN. Steep 
gradients can be particularly associated with minor species whose non-zero mass 
fractions can be located on small portions of the manifold. Problems with ANN recon- 
struction of minor species on a PCA-derived manifold have recently been reported by 
Dalakoti et al. (2020). Nevertheless, the attempts to link the poor regression perfor- 
mance with the manifold topology are still scarce in the existing literature, with only 
a few studies emerging recently (Malik et al. 2022a; Perry et al. 2022; Zdybat et al. 
2022c). We show examples of quantitative measures to assess the quality of LDMs 
that can help bridge this gap. We argue that the future research efforts should focus 
on advancing strategies that improve regression on manifolds. This should allow to 
better leverage the capability of techniques such as ANNs or GPR to approximate 
even highly nonlinear relationships between variables (Hornik et al. 1989). 


248 K. Zdybat et al. 


PCAfold examples 

The present chapter includes illustrative examples using PCAfol1d (Zdy- 
bat et al. 2020), a Python software package for generating, analyzing 
and improving LDMs. It incorporates the entire ROM workflow from 
data preprocessing, through dimensionality reduction to novel tools for 
assessing the quality of LDMs. PCAfold is composed of three main 
modules: preprocess, reduction and analysis. In brief, the 
preprocess module allows for data preprocessing such as centering 
and scaling, sampling, clustering and outlier removal. The reduction 
module introduces dimensionality reduction using PCA. The available 
variants are global and local PCA, subset PCA and PCA on sampled 
datasets. Finally, the analysis module combines functionalities for 
assessing LDM quality and nonlinear regression results. Each mod- 
ule is accompanied by plotting functions that allow for efficient view- 
ing of results. For instructions on installing the software and for fur- 
ther illustrative tutorials, the reader is referred to the documentation: 
https://pcafold.readthedocs.io/. In the PCAfold exam- 
ples that follow, we present a complete workflow that can be adopted for a 
combustion dataset, using all three modules in series: preprocess > 
reduction > analysis. We begin by importing the three modules: 


from PCAfold import preprocess 
from PCAfold import reduction 
from PCAfold import analysis 


2 Governing Equations for Multicomponent Mixtures 


In this section, we begin with the description of the governing equations for low-Mach 
multicomponent mixtures, whose solution is the starting point for obtaining training 
datasets for ROMs in reacting flow applications. In the discussion that follows, V - @ 
denotes the divergence of a vector quantity @, V@ (or V) denotes the gradient of a 
vector quantity @ (or a scalar quantity @) and the: symbol denotes tensor contraction. 
The material derivative is defined as D/Dt := 0/dt + v - V. We let v be the mass- 
averaged convective velocity of the mixture, defined as 


vi= a Y;u; , (d) 
i=1 


where Y; is the mass fraction of species i, u; is the velocity of species i and n is the 
number of species in the mixture. At a given point in space and time, transport of 
physical quantities in a multicomponent mixture can be described by the following 
set of governing equations written in the conservative (strong) form: 
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e Continuity equation: 


a 
=-V. 2 
at PV, (2) 


where p is the mixture density. 
e Species mass conservation equation: 


apy; 
ot 


=-V-pYiv—-—V-ji +o; for i = 1,2,...,n— 1, (3) 


where j; is the mass diffusive flux of species i relative to the mass-averaged velocity 
and œw; is the net mass production rate of species i due to chemical reactions. 
Note, that summation of Eqs. (3) over all n species yields the continuity equation 
(Eq. (2)) since )7"_, Y; = 1, $; ji = 0 and } `; @; = 0. For this reason, only 
n — | independent species mass conservation equations are solved. Mass fraction 
of the nth species can be computed from the constraint }°"_, Y; = 1. 

e Momentum equation: 

dpv n 


PY = _V.pw—V-t—V-plt+ Vif, 4 
7 pvy p pd, (4) 


where T is the viscous momentum flux tensor, p is pressure, I is the identity tensor 
and f; is the net acceleration from body forces applied on species 7. 

with one of the following forms of the energy equation: 

e Total internal energy equation: 


=-V-pev—V-q-V-t-v—V-pv+) fi-n;, (5) 


i=1 


apeo 
ot 


where eọ is the mixture specific total internal energy, q is the heat flux and n; := 
pY;u; is the total mass flux of species i. 
e Internal energy equation: 


a n 
s =V; pey V-q-t: Vv pV-vt+ fiji, (6) 


il 


where e is the mixture specific internal energy. 
e Enthalpy equation: 


doh Dp = 
— =-V-:-phv—-V-q-T:V — f; - ii. 7 
7 phy q v+ Di + j (7) 


where h is the mixture specific enthalpy. 
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e Temperature equation: 


apT 1 aT Dp 1 Tet 
=-V-pT V-q4 t:Vv4 h(V -j, —;) +f- ji), (8 
ðt pa Cp k cp Dt cp : Cp 2 EETA ii (8) 


where T is the temperature, œ is the coefficient of thermal expansion of the mixture 
(a = 1/T for an ideal gas), c, is the mixture isobaric specific heat capacity and 
h; is the enthalpy of species i. 


The governing equations can also be re-formulated using a reference velocity differ- 
ent from the mass-averaged velocity used here. A different mixture velocity would 
not only affect the terms involving v explicitly, but also an appropriate diffusive flux 
will have to be formulated. 

The set of governing equations is closed by a few additional relations. The first 
one is an equation of state. For an ideal gas, we have 


, (9) 


where R, is the universal gas constant and M = (X; Y;/ M) is the molar mass 
of the mixture where M; is the molar mass of species i. For a chemically reacting 
flow, we also require a chemical mechanism that relates temperature, T , pressure, p, 
and composition, [Y1, Y2,..., Y„], to the chemical source terms, w;. The heat flux, 
q, requires modeling as it in general can include all possible means of heat transfer. 
One encountered model for q can be written using the standard Fourier term and the 
term representing heat transfer through molecular diffusion of species: 


i=l 


where À is the mixture thermal conductivity. We also require a model for the diffusive 
fluxes, j;. Assuming Fick’s law as a model for diffusion, we can express the mass 
diffusive flux as 

ji = —pDVY;, (11) 


where D is a matrix of Fickian diffusion coefficients that are functions of the binary 


diffusion coefficients and composition. Finally, we require a model for the viscous 
momentum flux tensor, t. Assuming Newtonian fluids, t can be expressed as: 


2 
t = —y(Vv + (Vv)') + (Gu = K)(v vI, (12) 


where u is the mixture viscosity. « is the mixture dilatational viscosity and T denotes 
matrix transpose. The reader is referred to numerous great resources for a deeper 
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discussion of multicomponent mass transfer or derivation of the equations above 
(Taylor and Krishna 1993; Giovangigli 1999; Bird et al. 2006; Kee et al. 2005). 
The governing equations given by Eqs. (2)—(8) can be written in a general matrix 


form: 
ax! 


=-v-C'-v-D'+S', (13) 


where X € R%*2 is the thermo-chemical state vector, C € R?*"*2 is the convective 
flux vector, D € R¢*"*® is the diffusive flux vector and S € R%*2 is the source terms 
vector. Here, Q is the number of transported properties, d is the number of spatial 
dimensions of the problem and N is the number of observations. The observations 
can for instance be linked to measurements on a spatio-temporal grid of a discretized 
domain. Typically, N >> Q, but the magnitude of Q strongly depends on the number 
of species in the mixture. In combustion problems, Q can easily reach the order 
of hundreds when large chemical mechanisms are used (Lu and Law 2009). The 
appropriate formulation of X, C, D and S will depend on a given problem and the 
assumed simplifications to the governing equations. In the most general case, when 
all transport equations are solved and no further simplifications are made to the 
governing equations as given by Eqs. (2)-(8), we form the columns of X, C, D and 
S as per Table |. Note, that the order of columns in X does not matter, as long as the 
corresponding column in C, D and S carries an appropriate term. Since the thermo- 
chemical state of a single-phase multicomponent system is defined by Q =n + 1 


Table 1 Formulation of the thermo-chemical state vector, X, the convective flux vector, C, the 
diffusive flux vector, D, and the source terms vector, S, in the most general case, where no further 
assumptions are imposed to the strong form of the governing equations given by Eqs. (2)-(8) 


Equation State vector Convective flux Diffusive flux Source terms 
vector vector vector 
(Columns of X) (Columns of C) (Columns of D) (Columns of S) 
Continuity p pv 0 0 
Species mass | oY; pYiv ji wi 
Momentum pv pyy t+ pl p}; Yifi 
Total internal | peg peov qt+t-v+pv Ly fn, 
energy 
Internal energy | pe pev q —t:Vv—pV- 
vt Vin fi -ji 
Enthalpy ph phy q —t:Vv+ Pp + 
Dias fi -di 
Temperature | pT pTv 0 = + V-qt 
P 
aT Dp _ 1 
cp Dt Cp 
Vv+ 
1 
$2 8 V: 


ji — oi) + fi - ji) 
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variables, an example state vector that follows from the conservative form of the 
governing equations can be: X = [p, pe, pY1, PY2,..., PYn—1] (the conserved state 
vector). For the reasons explained earlier, we only include n — 1 independent species 
mass fractions. Mass fraction of the most abundant species is most often removed 
(Niemeyer et al. 2017). Historically, specific momentum quantity (ov) has not been 
included in the state vector in ROM of reacting flows (Sutherland and Parente 2009). 
Various other definitions of the state vector, X, can be adopted with the caveat that 
the system given by Eq. (13) should not be over-specified (Giovangigli 1999; Hansen 
and Sutherland 2018). In the next section, we review several strategies to obtain data 
matrices X, C, D and S. 


3 Obtaining Data Matrices for Data-Driven Approaches 


High-dimensional datasets, that are typical to reacting flow applications, can come 
from numerical simulations or experiments. A few types of numerical datasets of 
varying complexity often used in the context of ROM are presented in Fig. 1. In partic- 
ular, solving the governing equations presented in Sect. 2 for simple reacting systems 
is one computational strategy to obtain training data for ROM. Those simple systems 
can include zero-dimensional reactors, strained laminar flamelets (Peters 1988), one- 
dimensional flames or one-dimensional turbulence (ODT) (Kerstein 1999; Suther- 
land et al. 2010; Echekki et al. 2011). With sufficient amount of assumptions made 
to the governing equations, we can obtain those datasets at a relatively cheap com- 
putational cost. Relaxing some of those assumptions, on the other hand, can move us 
along the axis of an increasing complexity of the training data, incorporating more 
information about the turbulence-chemistry interaction. At the end of the complexity 
spectrum, we have a full direct numerical simulation (DNS), which results in high- 
fidelity data with all spatial and temporal scales directly resolved. Resorting to more 
expensive numerical simulations, such as large eddy simulation (LES) or DNS, might 
not be necessary for ROM purposes. For instance, ODT datasets have been shown 
to reproduce the DNS conditional statistics well (Punati et al. 2011; Abboud et al. 
2015; Lignell et al. 2015; Punati et al. Oct 2016) and have therefore been frequently 
used in the context of ROM (Mirgolbabaei and Echekki 2014; Mirgolbabaei et al. 
2014; Mirgolbabaei and Echekki 2015; Biglari and Sutherland 2015) since they are 
computationally cheaper to obtain. For an additional overview of datasets presented 
in Fig. 1 the reader is referred to (Zdybat et al. 2022a). 

As an illustrative example, the governing equations for an adiabatic, incompress- 
ible, zero-dimensional reactor simplify to: 
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Fig. 1 Schematic overview of training datasets for ROM. As we move along the axis of an increasing 
complexity, more physical detail is incorporated into the reacting flow simulation 


Since a zero-dimensional reactor represents combustion happening in a single point in 
space, all spatial derivatives present in Eqs. (2)—(8) vanish. Collecting all observations 
of T and Y; into a matrix X, and collecting all observations of —1/pc, SS hiwi 
and w;/ into a matrix S, we get 


X=|TY% V... Y| and S= | -+i hio; 2 2... 2% 


Note, that even though we have removed the transport equation for the nth species, 
the temperature equation still couples all species through the — }-;_; hiw; term, 
which represents the heat release rate. 


4 Reduced-Order Modeling 


At this point, we have learned how to construct training datasets which are the start- 
ing point for applying data-driven approaches. It has been a frequent trend in recent 
years to apply dimensionality reduction techniques to combustion datasets, both for 
ROM and for data analysis. In the context of combustion, techniques such as PCA 
(Sutherland and Parente 2009), local PCA (Parente et al. 2009, 2011), kernel PCA 
(Mirgolbabaei and Echekki 2014), t-distributed stochastic neighbor embedding (t- 
SNE) (Fooladgar and Duwig 2018), independent component analysis (ICA) (Gitushi 
et al. 2022), non-negative matrix factorization (NMF) (Zdybal et al. 2022a) or aut- 
encoders (Zhang et al. 2021) have been used. In this chapter, we focus on using 
dimensionality reduction techniques solely to model reduction. We use the premise 
that the original dataset, X, of high rank can be efficiently approximated by a matrix 
of a much lower rank. The data can then be re-parameterized with the new mani- 
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fold parameters (Sutherland et al. 2007). Dimensionality reduction is often coupled 
with nonlinear regression to provide a more robust mapping between the manifold 
parameters and the quantities of interest. In this section, we review ROM strategies 
for reacting flows that include dimensionality reduction and nonlinear regression. 


4.1 Data Preprocessing 


The first step towards applying dimensionality reduction is data preprocessing. The 
most straightforward way is data normalization (centering and scaling), which allows 
to equalize the importance of physical variables of different numerical ranges. Any 
variable ¢ in a dataset can be centered and scaled using the general formula $ = 
($ — c)/d, where c is the center computed as the mean value of ¢ and d is the scaling 
factor. Other data preprocessing means can include data sampling to tackle imbalance 
in sample densities, data subsetting (feature selection), or outlier removal. The effect 
of data preprocessing, including scaling and outlier removal, on the resulting LDMs 
was studied in (Parente and Sutherland 2013). In the discussion that follows, we 
assume that the training datasets have been appropriately preprocessed. 


4.2 Reducing the Number of Governing Equations 


Data-driven model reduction has emerged in recent years with applications to com- 
plex dynamical systems. Model reduction of complex systems typically starts with 
changing the basis to represent the original high-dimensional system. Let A € R2*2 
be the matrix of modes defining the new basis. The matrix A can be found directly 
from the training data using a dimensionality reduction technique, such as PCA. As 
long as A is constant in space and time, the governing equations of the form presented 
in Eq. (13) can be written as: 


3A- X' 
ot 


=-V-A-C'-V-A-D'+A-S', (14) 


where X can in general contain all state variables as presented in Sect. 2, or a subset 
of those. Equation (14) represents transformation of the original governing equations 
to the new basis defined by A. 


4.2.1 Principal Component Transport 
PCA is one dimensionality reduction technique that can be used to obtain the basis 


matrix A by performing eigendecomposition of the data covariance matrix. PCA can 
provide optimal reaction variables, PCs, that are linear combinations of the original 
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thermo-chemical state variables (Sutherland 2004; Sutherland and Parente 2009; 
Parente et al. 2009). We can define the matrix of PCs, Z € RY*2, as Z = XA, which 
represents the transformation of X to the new PCA-basis. The governing equations 
written in the form of Eq. (13) can be linearly transformed to this new PCA-basis as 
per Eq. (14). This yields a new set of transport equations for the PCs: 


az" T T T 


where Cz = CA are the projected convective fluxes, Dz = DA are the projected 
diffusive fluxes and Sz = SA are the PC source terms — the source terms of the 
original state-space variables transformed to the new PCA-basis. We will further 
refer to the jth PC (the jth column of Z) as Z; and to the jth PC source term 
(the jth column of Sz) as Sz,;. By solving the transport equations for the first q 
PCs only, we can significantly reduce the number of PDEs in Eq. (15) as compared 
to Eq. (13). PCA further guarantees that the q first PCs are the most important 
ones in terms of the variance retained in the data. From the Eckart- Young theorem 
(Eckart and Young 1936), we know that approximating the dataset X with only q 
first PCs gives the closest rank-q approximation to X. This approximation can be 
obtained through an inverse basis transformation: X ~ Zaha’, where the subscript 
q denotes truncation to q components. With the PCA modeling approach, the first q 
PCs become the reaction variables that re-parameterize the original thermo-chemical 
state-space. They also define the g-dimensional manifold, embedded in the originally 
Q-dimensional state space. 

Formulation of PC-transport was first proposed by Sutherland and Parente (2009). 
Since then, numerous a priori (Biglari and Sutherland 2012; Mirgolbabaei and 
Echekki 2013; Mirgolbabaei et al. 2014; Malik et al. 2018; Ranade and Echekki 
2019; Dalakoti et al. 2020; D’ Alessio G et al. 2022; Zdybat et al. 2022c) and a pos- 
teriori (Isaac et al. 2014; Biglari and Sutherland 2015; Echekki and Mirgolbabaei 
2015; Coussement et al. 2016; Owoyele and Echekki 2017; Ranade and Echekki 
2019; Malik et al. 2020, 2022a,b) studies have been conducted. The advantage of 
PCA-based modeling is that models can be trained on datasets coming from sim- 
pler systems that are cheap to compute (such as zero-dimensional reactors or laminar 
flamelets, see Sect. 3). This has been shown to be a feasible modeling strategy (Malik 
etal. 2018, 2020), as long as the training data covers the possible states of the reacting 
system that might be accessed during simulation of real systems. 

There are a few additional ingredients of the PC-transport modeling approach. 
First, since Eq. (15) is solved for the PCs which do not have any physical relevance, we 
require a mapping back to the original thermo-chemical state-space, where physical 
quantities of interest can be retrieved. Second, we need to parameterize the source 
terms, Sz, of any non-conserved manifold parameters (Sutherland 2004; Sutherland 
and Parente 2009). While in the original state space we have known relations between 
the transported variables and their source terms, we lack such explicit relations in the 
space of PCs. Both these points can be handled by coupling nonlinear regression with 
the PC-transport model—this will be further discussed in Sect. 4.4. Finally, in the 
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presence of diffusion, diffusive fluxes need to be represented in the new PCA-basis 
as well. Treatment of PC diffusive fluxes was proposed by Mirgolbabaei and Echekki 
(2014) and by Biglari and Sutherland (2015). A study by Echekki and Mirgolbabaei 
(2015) further looked into mitigating the multicomponent effects associated with 
diffusion of PCs. Another study by Coussement et al. (2016) looked at the influence 
of differential diffusion on PCA-based models. The work done in (Coussement et al. 
2016) looked at how rotation of the PCs can diagonalize the PCs diffusion coefficients 
matrix and thus make the treatment of diffusion of PCs easier. 


Computing the PCs and the PC source terms 

In this example, we demonstrate how one can obtain the PCs and the 
PC source terms from the state vector, X, and the source terms vector, 
S, respectively. We use a syngas/air steady laminar flamelet dataset and 
generate its two-dimensional (2D) projection onto the PCA-basis. The 
dataset was generated using Spitfire Python library (Hansen et al. 
2022) and the chemical mechanism by Hawkes et al. (2007). 

Load the dataset, removing the nth species, N2: 


import numpy as np 
np.genfromtxt (’syngas-air-SLF-state-space.csv’, delimiter=',’) 
[:,0:-1] 
np.genfromtxt (’syngas-air-SLF-state-space-sources.csv’, delimiter 
=',')[:,0:-1] 
np.genfromtxt (’syngas-air-SLF-mixture-fraction.csv’, delimiter 
rt) 


= 


chi = np.genfromtxt (’syngas-air-SLF-dissipation-rates.csv’, delimiter 


=! 


(n_observations, n_variables) = X.shape 


Perform PCA on the dataset: 


pca = reduction.PCA(X, scaling=’auto’, n_components=2) 
Transform the state vector, X, to the new PCA basis: 
= pca.transform(X) 
Transform the source terms vector, S, to the new PCA basis (note the 


nocenter=True flag): 


S_Z = pea.transform(S, nocenter=True) 


Visualize the 2D projection of the dataset, colored by the two PC source 
terms, Sz,ı and Sz. (Fig. 2): 


plt = reduction.plot_2d_manifold(Z[:,0], 
color=S_ 
s=15, 
x_label='$Z_{1}$ [$-$]’, y_label=' 


Z[:,1], 
zZ[:,0], 


$Z_{2}$ [$-$]’, 
colorbar_label='$S_{Z, 1}$\n[$-$]’, 
color_map=’inferno’, 
grid_on=True, 
figure_size=(6,4)) 
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Fig. 2 Outputs of analysis.plot_2d_manifold 


It is visible from the plot above that this 2D projection introduces sig- 
nificant non-uniqueness that particularly affects the dependent variable 
Sz,,. At the same time, this visible overlap in the (Z1, Z2) space does not 
coincide with the region of the largest variation in the second PC source 
term, Sz, values. We can expect that Sz; will be much more strongly 
affected by the manifold non-uniqueness than Sz 9. 


4.3 Low-Dimensional Manifold Topology 


Apart from PCA, numerous manifold learning methods can help identify LDMs 
in high-dimensional combustion datasets. Although the approach presented in 
Sect. 4.2.1 allows for substantial model reduction, several manifold challenges need 
to be addressed. In particular, during projection of data to a lower-dimensional basis, 
non-uniqueness can be introduced in the manifold topology which can hinder suc- 
cessful model definition. A good model should provide unique definition of all rele- 
vant dependent variables as functions of the manifold parameters (Sutherland 2004; 
Pope 2013). With this premise, the future research directions can be twofold. First, 
we require techniques to characterize the quality of LDMs. Second, we should seek 
strategies that provide an improved manifold topology. Both points should feed one 
another and can be tackled simultaneously. 

Measures such as the coefficient of determination (Biglari and Sutherland 2012) 
or manifold nonlinearity (Isaac et al. 2014) have been used in the past to assess man- 
ifold parameterizations a priori. A recently proposed normalized variance derivative 
metric (Armstrong and Sutherland 2021) is much more informative in comparison. It 
can characterize manifold quality with respect to two important aspects: feature sizes 
and multiple scales of variation in the dependent variable space. Multiple scales of 
variation can often indicate non-uniqueness in manifold parameterization. A more 
compact metric based on the normalized variance derivative has also been proposed 
recently (Zdybat et al. 2022b). It reduces the manifold topology to a single number 
and can be used as a cost function in manifold optimization tasks. 
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Some topological challenges can be mitigated through appropriate data prepro- 
cessing prior to projecting to a lower-dimensional space. The most straightforward 
strategy is data scaling, with Pareto (Noda 2008) or VAST (Hector et al. 2003) scal- 
ings most commonly used (Biglari and Sutherland 2015; Isaac et al. 2015; Malik et al. 
2018, 2020). Other authors have tackled manifold challenges by training combustion 
models on only a subset of the original thermo-chemical state-space variables (Chat- 
zopoulos and Rigopoulos 2013; Mirgolbabaei and Echekki 2013, 2014; Echekki and 
Mirgolbabaei 2015; Isaac et al. 2015; Owoyele and Echekki 2017; Malik et al. 2020; 
Nguyen et al. 2021; Gitushi et al. 2022). Recent work developed a strategy for a 
manifold-informed state vector subset selection (Zdybat et al. 2022b). A study done 
by Coussement et al. (2012) suggests that tackling initial imbalance in data density 
can yield a more accurate low-dimensional representation of the flame region. 

Another important decision that needs to be made at the modeling stage is what 
manifold dimensionality, g, should we select? Additional number of parameters 
may be required for more complex manifold topologies. While techniques such 
as PCA provide orthogonal manifold parameters (PCs), each bringing information 
about variance in another orthogonal data dimension, it is not clear how many PCs 
is sufficient to provide a good quality, regressible manifold topology. From the 
computational cost point of view, keeping low manifold dimensionality is desired. 
However, keeping q small should not be at the expense of the parameterization 
quality. Admittedly, more work is required to provide answers to those questions. 


Low-dimensional manifold assessment 

Below, we demonstrate how we can assess the quality of LDMs obtained 
from PCA using the novel normalized variance derivative metric (Arm- 
strong and Sutherland 2021). We will assess the generated 2D projections 
and we take the two PC source terms as the two dependent variables. 
Define the bandwidth values, o: 


bandwidth_values = np.logspace(-5, 1, 100) 


Specify the names of the dependent variables: 


variable_names=['’$S_{Z,1}$’, ‘'$S_{Z,2}$’ 


Compute the normalized variance derivative, D(o): 


variance_data = analysis.compute_normalized_variance(Z, S_Z, 
variable_names, 
bandwidth_values 
=bandwidth_values) 


Plot the Do) curves for the two PC source terms (Fig. 3): 


analysis.plot_normalized_variance_derivative (variance_data, 
color_map='Greys’, 
figure_size=(10,2.5) 
) 
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Fig. 3 Output of analysis.plot_normalized_variance_derivative 


The normalized variance derivative, D(o), quantifies the information con- 
tent on a manifold at various length scales specified by the bandwidth, 
o. The peaks in the Do) profile happening at very small length scales 
can often be linked to non-uniqueness in manifold topologies. In the plot 
above, we can observe two distinct peaks corresponding to the Do) curve 
for the first PC source term, Sz. The peak happening for smaller o can 
be understood from our visualization of the manifold topology in Fig. 2. 
In our visualization we have seen clear overlap, where the observations 
corresponding to highly negative values of Sz; were projected directly 
above observations corresponding to Sz,ı ~% 0. The information provided 


by D(c) is valuable at the modeling stage, as it allows to quantitatively 
assess the quality of low-dimensional data projections. 


4.4 Nonlinear Regression 


Nonlinear regression is often used to provide an effective mapping between the mani- 
fold parameters and the dependent variables of interest (Biglari and Sutherland 2015; 
Mirgolbabaei and Echekki 2015; Malik et al. 2018; Dalakoti et al. 2020). The set of 
dependent variables, @, typically include the PC source terms, Sz, and the thermo- 
chemical state-space variables, such as temperature, density and composition. Unlike 
the inverse basis transformation discussed in Sect.4.2.1, regression has the poten- 
tial to yield much more accurate dependent variable reconstructions (Mirgolbabaei 
and Echekki 2015). Nonlinear regression techniques allow us to encode nonlinear 
relationships between the manifold parameters and the dependent variables. This 
characteristic is especially desired for modeling source terms, which are highly non- 
linear functions of the independent variables. In the past research, reconstruction of 
the PC source terms has been shown to be much more challenging than reconstruction 
of the state variables (Biglari and Sutherland 2012, 2015). This is due to the fact that 
the state-space variables evolve nonlinearly according to the Arrhenius relations. 


260 K. Zdybat et al. 


In this section, we are concerned with a set of ng dependent variables defined 
as @ = [Sz, T, p, Y;], where Y; is a vector of n — 1 species mass fractions, Y; = 
[Y1, Yo,..-, Y,—1]. In mathematical terms, the goal of nonlinear regression is to find 
a function F, such that: 

Q X F(LQyq), (16) 


where ¢ is a dependent variable and Zq are the q first PCs. It is worth noting that 
some regression techniques allow to obtain all dependent variables at once; other 
require regressing dependent variables one-by-one. Three popular nonlinear regres- 
sion techniques are reviewed in this section. Our main focus is in presenting how the 
function ¥ is defined in each technique. 


Nonlinear regression 

In the examples that follow, we will perform and assess ANN, GPR and 
kernel regression of the two PC source terms defined earlier. The nonlinear 
regression models will be trained on 80% and tested on the remaining 20% 
of the data. Below, we use the sampling functionalities to randomly sample 
train and test data: 


sample_random = preprocess.DataSampler(np.zeros((n_observations,)). 
astype(int), 
random_seed=100, 
verbose=True) 
(idx_train, idx_test) = sample_random.random(80) 
Z_train = Z[idx_train,:]; Z_test = Z[idx_test,:] 
S_Z train = §_Z[idx_train,:]; S_Z test = S_Z[idx_test,:] 


4.4.1 Artificial Neural Network 


Artificial neural networks (ANNs) are a network of connected layers that compute 
the output(s) based on some convolution of the layer’s input(s) (Russell and Norvig 
2002). The layer’s inputs and outputs are called neurons. ANNs form a parametric 
technique that can be used both for regression and classification and are broadly used 
in the context of ROM. This applies to both reacting (Mirgolbabaei and Echekki 2013, 
2014, 2015; Echekki and Mirgolbabaei 2015; Ranade and Echekki 2019; Dalakoti 
et al. 2020; Zhang et al. 2020) and non-reacting (pure fluid) applications (Farooq 
et al. 2021). 

For an architecture with a single neural layer (input — output), the regression 
function ¥ at some query point P can be written as: 


F |» = 81(Zq|pWi + bi), (17) 


where W; € R?*"* is the matrix of weights and b; € R!*"# is the vector of biases, 
and g; is the activation function. Both W; and b; are learned from the training data by 
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solving an optimization problem. For a deep neural network (DNN) which allow for 
multi-layer architecture, the regression function becomes a composition of functions 
of the form shown in Eq. (17). Assuming m neural layers, we can write that 


F| p = 8m(8m-1 ++ 8281 (Zq| p Wi + b1)W2 + b2) ++» Win—1 + bm—1) Wm + bm), 
(18) 


where all matrices W; as well as all vectors b; for layers! = 1,2, . . . , m, do not need 
to be of the same size, since the number of neurons can vary in different layers. Also 
the activation functions g; can vary for different layers. The Eq. (18) essentially states 
in matrix notation that the output of one layer becomes an input of the following layer. 
The advantage of using ANN regression is that predictions are relatively cheap to 
compute once the ANN model has been trained. As can be seen from Eqs. (17)—(18), 
predicting a single observation of @ given a set of query inputs, Zg | p» requires vector- 
matrix multiplication(s), where W; is typically a small matrix. This makes ANNs 
very appealing from the computational cost point of view. However, the optimization 
used to determine weights and biases is prone to reaching local minimum. The best 
one can hope for is that the local minimum will result in reasonable predictions. The 
overall performance of the trained network is dependent on many factors that the 
user can tune, such as the architecture or the choice of the activation function(s). 
The ANN predictions are also dependent on the random initial guess for the weights 
and biases which can greatly affect gradient descent -based algorithms. To improve 
the network performance, Bayesian optimization can be used to determine the ANN 
hyper-parameters (Mockus 2012; Bergstra et al. 2013; Barzegari and Geris 2021). 


ANN regression 

In this example, we create an ANN model to obtain the parameterizing 
function, F. We will use a popular Python library for ANN, Keras 
(Chollet et al 2015), which is a backend of the TensorFlow software 
(Abadi et al. 2015). Below, we import the necessary libraries: 


from keras.models import Sequential 
from keras.layers import Dense 

from keras import optimizers 

from keras import losses 


We use a relatively simple architecture with two hidden layers with five 
neurons each: 


model = Sequential([ 

Dense (5, input_dim=2, activation='’sigmoid’), 
Dense(5, activation='’sigmoid’), 

Dense(2, activation=’linear’)]) 


Normalize the ANN outputs to the (—1; 1) range: 


(normalized_S_Z, centers, scales) = preprocess.center_scale(S_Z, '-1 
tol’) 
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Sample the normalized train data outputs: 


normalized_S_Z train = normalized_S_Z[idx_train,:] 


Compile the ANN model with the given architecture : 


model.compile(optimizers.Adam(lr=0.001), 
loss=losses.mean_squared_error, 
metrics=[’mse’]) 


Fit the compiled ANN model with the training data, specifying the 
hyper-parameters: 


history = model.fit(Z_train, 
normalized_S_Z_ train, 
batch_size=100, epochs=500, 
validation_split=0.2, verbose=0) 


Finally, we predict the two PC source terms, remembering to invert the 
(—1; 1) normalization applied initially: 


S_Z_ANN_ predicted model .predict (Z) 
S_Z_ANN_predicted preprocess. invert_center_scale(S_Z_ANN_predicted, 
centers, scales) 


We can visualize the regression result in 3D (Fig. 4): 


analysis .plot_3d_regression ( : 


ZL: 0], 
S_Z_ANN_predicted[:,0] 
elev=30, 

azim=200, 

x_label='$2_1$ [$-$]’, 
y_label='$Z_25 [$-$]', 
z_label='$S_{Z, 1}$ [$-$]’, 
figure_size=(12,6)) 
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Fig. 4 Outputs of analysis.plot_3d_regression 


The figure above demonstrates qualitatively how regression can strug- 
gle to regress dependent variables on an ill-behaved manifold. We can 
observe regions with large mismatch between the observed and the pre- 
dicted values of the two PC source terms. In particular, highly negative 
values of Sz, ı are poorly predicted. This behavior can be linked to our 
manifold topology assessments in the earlier examples, where we have 
seen non-uniqueness affecting highly negative values of Sz 1. 
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4.4.2 Gaussian Process Regression 


Gaussian process regression (GPR) is a kernel-based, semi-parametric regression 
technique (Williams and Rasmussen 2006). A powerful characteristic of GPR is 
that prior knowledge about the functional relationship between the independent and 
dependent variables can be injected at the modeling stage. For instance, if the system 
dynamics is known to have an oscillatory behavior, the kernel can be built using a 
periodic function. Another important feature of GPR is that it provides uncertainty 
bounds on the predicted variables, while techniques such as ANN or kernel regression 
only provide predictions. 
In GPR, the regression function ¥ is learned from the data: 


F (Za) = GP(m(Zq), K(Zq, Zq’)) , (19) 


where GP denotes a Gaussian process, m is the mean function and K is the covariance 
matrix. The covariance matrix, K € R”**"", can be populated using any kernel of 
choice as long as the elements in K satisfy k;,; = kj, Viz;. Typically, kernels are 
functions of the distance between data observations, x; and x;. Squared exponential 
kernel is commonly used to populate K: 


me yay 
ki; = h? exp (=>) (20) 


where A is the scaling factor and A is the bandwidth of the kernel. Figure 5a visu- 
alizes the effect of increasing the kernel bandwidth, à, on the resulting covariance 
matrix structure. With a larger A, we are allowing observations that are further apart 


A=0.2 A=1 A=5 
| PY IN ~ 


b c 
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Fig. 5 The effect of kernel bandwidth on smoothing the Gaussian process regression predictions. 
In this example, the scaling factor h = 0.1. a Heatmaps of three covariance matrices, K, generated 
using the squared exponential kernel with an increasing kernel bandwidth, A. b Example regression 
function realizations resulting from each covariance matrix. ¢ Histogram of one hundred function 
realizations corresponding to the A = 5 case with the mean equal to 10. The mean dictates the most 
probable function value 
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to correlate. The structure of K is then reflected in possible regression function real- 
izations (Fig. 5b). With a very narrow kernel (here A = 0.2), the resulting realization 
looks very noisy—even nearby observations can have very different function values. 
The larger the kernel bandwidth, the smoother the realization function (Duvenaud 
2014). With à = 5 we can expect stronger correlation in function values even for 
observations that are further away. Figure 5c additionally shows a histogram of one 
hundred regression function realizations resulting from A = 5. Since in this example 
we have chosen the mean equal to 10, the histogram has a Gaussian distribution 
centered around 10. 


GPR regression 

In this example, we create a GPR model to obtain the parameterizing 
function, .F. We will use a Python package george (Ambikasaran et al. 
2016) to perform GPR: 


import george 


Create the squared exponential kernel: 


kernel = george.kernels.ExpSquaredKernel (20, ndim=2) 


Fit the GPR model with the training data: 


gp = george.GP(kernel) 
gp.compute(Z_train, yerr=1.25e-12,) 


Predict the two PC source terms: 


S_Z1_GPR_predicted, S_Z1_GPR_var gp.predict(S_Z train[:,0], Z, 
return_var=True) 

S_Z2_GPR_predicted, S_Z2_GPR_var gp.predict(S_Z train[:,1], Z, 
return_var=True) 


We visualize the predicted PC source terms (Fig. 6): 
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Fig. 6 Outputs of analysis.plot_3d_regression 


In the plot above, we observe similar misprediction of the first PC source 
term, Sz 1, as we have seen with ANN regression. 
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4.4.3 Kernel Regression 


Kernel regression is a nonparametric technique that does not include the “training” 
step. Function ¥ is inferred for each query point, P, directly from the training 
data samples in some vicinity of P. The regression function F is built from the 
Nadaraya-Watson estimator (Härdle 1990) as: 


N 
F| MEN Jia Ki, p (Zq, o)ġi 


= (21) 
5 PG Ki p (Zq, 0) 


where K is the kernel function and ø is the kernel bandwidth. The Eq. (21) essentially 
represents a linear combination of the weighted observations of @. Similarly as in 
GPR, various kernels can be used in place of K. The most popular Gaussian kernel 
yields: 


a (22) 


Ki, p(Zq, o) = exp ( 7 


oO 


The larger the kernel bandwidth, o, the larger the resulting coefficients K; multi- 
plying each data observation, ¢;. In other words, an increasing o yields a stronger 
influence of data observations distant from P on the predicted function value at P. An 
implication of a larger o on regression means that .¥ becomes a smoother function — 
note the similarity of this concept with the covariance matrix discussion in Sect. 4.4.2. 


Kernel regression 

In this example, we create a kernel regression model to obtain the param- 
eterizing function, Z. We specify the kernel bandwidth, o, for the 
Nadaraya- Watson estimator: 


bandwidth = 0.5 


Fit the kernel regression model with the training data: 


model = analysis.KReg(Z_train, S_Z train) 


Predict the two PC source terms: 


S_Z KReg_predicted = model.predict(Z, bandwidth=bandwidth) 


Similarly as before, we visualize the predicted PC source terms (Fig. 7): 
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Fig. 7 Outputs of analysis.plot_3d_regression 
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Since kernel regression makes predictions by “smoothing out” function 
values over some neighborhood of a query point, the non-uniqueness in 
Sz ı values affected regression performance, similarly to what we have 
observed with ANN and GPR regression. 


Nonlinear regression assessment 

Here, we continue the kernel regression example and use various metrics to 
assess the regression performance. Two common metrics that are available 
are the coefficient of determination, R?, and the normalized root mean 
squared error (NRMSE). For vector quantities, such as the PC source 
terms vector, another useful metric might be the good direction estimate 
(GDE) which is a measure derived from cosine similarity. 

Compute the regression metrics for the two PC source terms: 


metrics = analysis.RegressionAssessment(S_Z, S_Z KReg_predicted, 
variable_names=variable_names 


norm='std’, 
tolerance=0.05) 


Display the regression metrics in a table format (Fig. 8): 


metrics.print_metrics(table_format=[’pandas’], metrics=[’R2’, ‘'NRMSE 
', 'GDE’]) 


R2 NRMSE GDE 


Szi 0.4800 0.7211 45.2449 


Sz2 0.8100 0.4358 45.2449 


, 


Fig. 8 Output of analysis.RegressionAssessment print_metrics. 


The RegressionAssessment class also allows to compare two 
regression results. It can color-code the displayed table and mark the 
metrics that got worse red and those that got better green. 

In addition to a single value of each metric for the entire dataset, we can 
also compute stratified metrics values, in bins (clusters) of a dependent 
variable. This allows us to observe how regression performed in specific 
regions of the manifold. Below, we compute the stratified metrics in four 
bins of the first PC source term, Sz ;. We then look at the kernel 
regression of the first PC source term in each bin. 

We first use the function from the preprocess module that allows to 
manually partition the dataset into bins of a selected variable. Compute 
the bins: 


(idx, _) = preprocess.predefined_variable_bins(S_Z[:,0], 
split_values=[-10000, 
0, 10000], 
verbose=False) 
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Those data bins (clusters) are visualized below on the syngas/air flamelet 
dataset in the space of mixture fraction and temperature (Fig. 9): 


preprocess.plot_2d_clustering(f, X[:,0], idx, 
x_label='$f$ [-]’, y_label=’S$TS [SKS$]’, 
first_cluster_index_zero=False, 
color_map=’coolwarm’, 
figure_size=(8,4)) 


2000 
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Fig. 9 Output of preprocess.plot_2d_clustering 


Compute the stratified regression metrics: 


metrics = analysis.RegressionAssessment (S_Z[:,0], S_Z KReg_predicted 
2,0 


idx=idx, 
use_global_mean=True, 
norm='std’, 
use_global_norm=True) 


Display the stratified regression metrics in a table format (Fig. 10): 


metrics.print_stratified_metrics(table_format=[’pandas’], metrics=[’ 
NRMSE’] ) 
Observations Min Max NRMSE 
k1 453 -91,760.3782 -10,016.6359 3.0987 
k2 3230 -9,989.6371 -0.0000 0.4287 
k3 6044 0.0000 9,965.7716 0.1419 
k4 73 10,002.6112 24,987.7263 0.6479 


Fig. 10 Output of analysis.RegressionAssessmentprint_ 
stratified_metrics. 
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The stratified metrics let us see that kernel regression performed rela- 
tively well for Sz; > —10, 000 with NRMSE values less than 1.0 in bins 
k2, kz and k4. However, we see that for observations in bin kı, correspond- 
ing to the smallest values of Sz 1, the NRMSE is significantly higher. The 
results of the stratified NRMSE values are consistent with what we have 
seen in Fig. 7 that visualized the regression result. We have seen a signif- 
icant departure from the observed and predicted data surface for highly 
negative values of Sz ,. Finally, we note that the stratified regression met- 
rics can be computed in bins obtained using any data clustering technique 
of choice. A good overview of data clustering algorithms can be found in 
(Thrun and Stier 2021). Some of those techniques are also implemented 
in the scikit-1learn Python library (Pedregosa 2011). 


5 Applications of the Principal Component Transport 
in Combustion Simulations 


Using large detailed chemical mechanisms inside a numerical simulation can become 
a tedious task, especially when other complex phenomena are involved, such as tur- 
bulence or pollutant formation. Therefore, parameterization of the thermo-chemical 
state of a reacting system using a reduced set of optimally chosen variables is very 
appealing. In this context, the use of PCA is well-suited. PCA allows to automat- 
ically reduce dimensionality and retain most of the variance of the system. As we 
have seen in Sect. 4.2.1, substantial reduction in the number of governing equations 
of the system can be made by transporting only a subset of the PCs in a numeri- 
cal simulation. In this section, we present recent applications of the PC-transport 
approach as reported in (Malik et al. 2018, 2020). 


5.1 A Priori Validations in a Zero-Dimensional Reactor 


We first show the application of the PC-transport approach in the context of zero- 
dimensional perfectly stirred reactor (PSR) calculations (Malik et al. 2018). The 
model validation was done a priori, meaning that the model training and validation 
were made using the same PSR configuration. Two different fuels were investigated: 
methane (CH4) and propane (C3Hs). For each fuel, the dataset for PCA was gener- 
ated with unsteady PSR simulations, varying the residence time in the reactor from 
extinction to equilibrium. For each residence time inside the reactor, the entire tem- 
poral solution from initialization to steady-state was saved. The dataset for PCA 
generated in this way contained approximately 100,000 observations for each state 
variable for the methane case, and 420,000 observations for each state variable for 
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the propane case. In methane simulations, the GRI-3.0 chemical mechanism (Smith 
et al. 2022) was used, with the nth species, N2, removed, resulting in 34 species. 
For the propane case, the Polimi_1412 chemical mechanism (Humer et al. 2007) 
was used, containing 162 species. PCA-basis was computed using the species mass 
fractions alone (X = [Y), Y2,..., Yn—1]). The solution of the PC-transport model (as 
per Eq. (15)) without coupling with nonlinear regression was first obtained, where 
the predicted quantities were computed using an inverse PCA-basis transformation. 
Then, the PC-transport approach was coupled with GPR regression (PCA-GPR) in 
order to increase the dimensionality reduction potential of PCA. Both PC-transport 
approaches were compared with the full solution obtained by transporting the original 
species mass fractions (as per Eq. (3)). 


5.1.1 Simulation Results for Methane/Air Combustion 


Figure 11 shows the PSR solution for the temperature and the H2O and OH mass 
fractions for the methane case. The results are obtained with the PC-transport model 
without nonlinear regression using q = 24,q = 25andq = 34 PCs (Fig. | 1a) and the 
PC-transport coupled with GPR regression using g = 1 andg = 2 PCs (Fig. 1 1b). For 
comparison, full solution solving governing equations for the original state variables 
is shown with the solid line. Using the PC-transport approach without nonlinear 
regression, at least q = 25 components out of 34 were required to obtain an accurate 
solution, which correspond to a model reduction of 26%. On the other hand, when the 
PC-transport model was coupled with GPR regression, the results show remarkable 
accuracy using only g = 2 PCs for the prediction of temperature, and both major 
and minor species. It can also be seen that the PCA-GPR model with q = 1 does 
not provide sufficient accuracy in the ignition region, under-estimating the ignition 
delay. 


5.1.2 Simulation Results for the Propane/Air Combustion 


Figure 12 shows the PSR solution for the temperature, and the CO2 and O7 mass frac- 
tions for the propane case. With the PC-transport model without regression (Fig. 12a), 
at least q = 142 components out of 162 are required in order to get an accurate 
description, representing a model reduction of 12%. By combining the PC-transport 
model with the potential offered by nonlinear regression (PCA-GPR), the number 
required components can be reduced down to q = 2. Although the reduced model 
performs well overall, some deviation from the full solution was observed in the igni- 
tion/extinction region. The PCA-GPR model was then further improved, by dividing 
the PCA manifold into two clusters and performing GPR regression locally in each 
cluster (PCA-L-GPR). By doing so, the level of accuracy of the model is significantly 
improved, leading to an almost perfect match with only q = 2 components instead 
of 162 (reduction of 98%). This improvement can be observed in Fig. 12b. 
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Fig. 11 Results of a priori PC-transport simulation of methane/air combustion in a zero- 
dimensional PSR reactor. Predictions of the temperature, H2O and OH mass fractions as a function 
of the residence time in the reactor with the solid line representing the full solution. The results 
are shown for a the PC-transport model without regression using q = 24, q = 25 and q = 34 PCs 
and b the PC-transport model coupled with GPR regression using q = 1 and q = 2 PCs. Reprinted 
from (Malik et al. 2018) with permission from Elsevier 
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Fig.12 Results ofa priori PC-transport simulation of propane/air combustion in a zero-dimensional 
PSR reactor. Predictions of the temperature, CO2 and OH mass fractions as a function of the 
residence time in the reactor with the solid line representing the full solution. The results are 
shown for a the PC-transport model without regression using q = 142 and q = 162 PCs and b 
the PC-transport model coupled with GPR regression performed globally (PCA-GPR) and locally 
(PCA-L-GPR) using q = 2 PCs. Reprinted from (Malik et al. 2018) with permission from Elsevier 


5.2 A Posteriori Validations on Sandia Flame D and F 


After validating the PCA-GPR approach in zero-dimensional calculations shown in 
the previous section, the current section shows the application of the PCA-GPR model 
in the framework of a non-premixed turbulent flame in a fully three-dimensional 
LES. The validation was done using the experimental measurements of the Sandia 
flames D and F (Barlow and Frank 1998). The Sandia flames D and F are piloted 
methane/air diffusion flames. The fuel is a mixture of CH4 and air (25/75% by 
volume) at 294K. The fuel velocity is 49.6m/s for flame D and 99.2m/s for flame F, 
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Fig. 13 The two-dimensional manifold obtained during PCA model training versus the manifold 
accessed during simulation of the Sandia flame D and F. With the training data preprocessing 
used here, a the first PC, Z1, is highly correlated with mixture fraction and can be linked to the 
mixture stoichiometry, and b the second PC, Z2, is highly correlated with the CO2 mass fraction, 
Yco2. Z2 can thus be interpreted as a variable describing reaction progress. c—d Scatter plots of the 
PCA manifold obtained from the training dataset (black points) and the manifold accessed during 
simulation (pink points) of ¢ the Sandia flame D, and d the Sandia flame F. Points on the simulation- 
accessed manifolds were down-sampled to 100,000 observations on each plot for clarity. Reprinted 
from (Malik et al. 2020) with permission from Elsevier 


the latter representing the most challenging test case, being close to global extinction. 
The pilot jet surrounding the fuel consists of burnt gases at 1880K and a low-velocity 
coflow of air at 291K surrounds the flame. 

The dataset for PCA model training is based on unsteady one-dimensional counter- 
flow diffusion methane flames. The inlet conditions for the fuel and air were set as in 
the experimental setup. Different counter-flow flames were generated by varying the 
strain rate, from equilibrium to complete extinction. The dataset generated in this way 
contained approximately 80,000 observations for each of the state-space variables. 
The GRI-3.0 chemical mechanism (Smith et al. 2022) (without Nz species) was 
used. With the data preprocessing used here (including Pareto scaling and removal 
of temperature from the state variables), the first PC (Z1) was highly correlated 
to the mixture fraction, whereas the second PC (Z2) can be linked to a progress 
variable with positive weights for the products and negatives weights for the reactants. 
These correlations between the PCs and physical variables is shown in Fig. 13a—b. 
It is interesting to point out that PCA identified these controlling variables without 
any prior assumptions or knowledge of the system of interest. All the state-space 
variables, such as temperature, density, species mass fraction as well as the PCs 
source terms, were regressed as function of Z; and Z2 using GPR (PCA-GPR). A 
lookup table was then generated for the simulation. 

The analysis of the manifold accessed during simulation is also interesting. In 
Fig. 13c—d, we show the training PCA manifold (black points) overlayed with mani- 
fold accessed during simulation of flame D and F respectively (pink points). In both 
figures, points on the simulation-accessed manifold were down-sampled to 100,000 
observations for clarity. It can be observed that both flame D and flame F simulations 
polled from points that stayed close to the training manifold. The highest density of 
points for flame D (Fig. 13c) is located near the equilibrium solution. This confirms 
the experimental findings that flame D does not experience significant extinction and 
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re-ignition. On the other hand, it can be observed in Fig. 13d that flame F experiences 
a higher level of extinction and re-ignition phenomena, which was expected from 
the experimental data. For flame F, the point density is distributed more uniformly 
between the equilibrium solution and the extinction regions of the training manifold 
than for flame D. Thus, the manifold accessed during simulation of flame F covers 
larger region of the training manifold than for flame D. 


5.2.1 Simulation Results for Methane/Air Combustion 


The simulations were performed in OpenFOAM using tabulated chemistry approach. 
The PCs were transported, and the dependent variables @ = [Sz T, p, Y;] were 
recovered from nonlinear regression. Details about the numerical setup can be found 
in (Malik et al. 2020). Figure 14 shows the temperature and the OH mass fraction 
profiles on the centerline (Fig. 14a), close to the burner exit (Fig. 14b) and further 
downstream (Fig. 14c) for flame D. It can be observed that the PCA-GPR model was 
able to reconstruct all variables with great accuracy. Moreover, a comparison is made 
between the PCA-basis calculated from the full set of 35 species and the PCA-basis 
computed from the reduced set of five major species only. The results are comparable 
for both bases, suggesting that using only the major species in order to build the PCA- 
basis results in no major loss of information. Figure 15 shows a comparison between 
the experimental and numerical profiles of temperature and selected species mass 
fraction on the centerline for flame F. The PCA-GPR model accurately predicts the 
peak and the decay in temperature and the species mass fraction profiles. 
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Fig. 14 Results of a posteriori PC-transport simulation of the Sandia flame D. Predictions of the 
temperature and the mass fraction of OH species a at the axial and b-c at the radial profiles. Results 
show a comparison between the PCA-basis calculated using the major species (PCA-GPR—major), 
the basis obtained using the full set of species (PCA-GPR—all) and the experimental data. Reprinted 
from (Malik et al. 2020) with permission from Elsevier 


Reduced-Order Modeling of Reacting Flows Using Data-Driven Approaches 273 


— PCA.GPR - major 
@ Experimental 


— PCA-GPR - major 
@ Experimental 


— PCA-GPR - major 
© Experimental 


— PCA-GPR - major 
@ Experimental 


292 


0 300 600 0 300 600 ò 300 600 0 300 600 
x [mm] x [mm] x [mm] x [mm] 


Fig. 15 Results of a posteriori PC-transport simulation of the Sandia flame F. Predictions of a 
the temperature and the major species mass fractions, b CH4 c CO2 and d Op against the experi- 
mental data at the flame centerline. The results are shown for the PC-transport model coupled with 
GPR regression where the PCA-basis was calculated using the major species (PCA-GPR—major). 
Reprinted from (Malik et al. 2020) with permission from Elsevier 


6 Conclusions 


In this chapter, we review the complete workflow for data-driven reduced-order mod- 
eling of reacting flows. We present strategies for model reduction using dimensional- 
ity reduction techniques and nonlinear regression. The originally high-dimensional 
datasets can be re-parameterized with the new manifold parameters identified directly 
from training data. The main focus is in the PC-transport approach, where the original 
system of PDEs is projected to a lower-dimensional PCA-basis. This approach allows 
for transporting a much smaller number of optimal manifold parameters and yields 
substantial model reduction. While in this chapter we review recent results from a 
priori and a posteriori combustion simulations using PC-transport, several important 
challenges still remain to be addressed in data-driven modeling of complex systems. 
For example, topological behaviors on manifolds, such as non-uniqueness or large 
spatial gradients of dependent variables, can hinder integration of model reduction 
with nonlinear regression. Possible future research directions that we delineate in 
this chapter are (1) developing tools for assessing quality of manifolds, (2) devel- 
oping strategies to mitigate undesired topological behaviors on manifolds and (3) 
improving our understanding and performance of nonlinear regression models. 
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AI Super-Resolution: Application A) 
to Turbulence and Combustion giecik 


M. Bode 


Abstract This article summarizes and discusses recent developments with respect to 
artificial intelligence (AI) super-resolution as a subfilter model for large-eddy simula- 
tions. The focus is on the application of physics-informed enhanced super-resolution 
generative adversarial networks (PIESRGANs) for subfilter closure in turbulence 
and combustion applications. A priori and a posteriori results are presented for var- 
ious applications, ranging from decaying turbulence to finite-rate chemistry flows. 
The high accuracy of AI super-resolution-based subfilter models is emphasized, and 
advantages and shortcoming are described. 


1 Introduction 


Many turbulent and reactive simulations require models to reduce the computational 
cost. Popular approaches include large-eddy simulation (LES) for modeling (reac- 
tive) turbulence and flamelet models for predicting chemistry. LES relies on the 
filtered Navier-Stokes equations. The filter operation separates the flow in larger 
scales above the filter width and smaller scales below the filter width, called subfilter 
contributions. As a result, the filtered equations can be advanced for less compu- 
tational cost, however, they require modeling for subfilter contributions. Accurate 
modeling of these unclosed terms is one of the key challenges for predictive LES. 
LES has been applied successfully to many different turbulent flows including reac- 
tive turbulent flows (Smagorinsky 1963; Pope 2000; Pitsch 2006; Beck et al. 2018; 
Goeb et al. 2021). The flamelet concept employs asymptotic and scale arguments 
to motivate that flow field and chemistry are only loosely coupled by the scalar 
dissipation rate, a measurement for the local mixing, in combustion. Consequently, 
advancing chemistry is reduced to solving coupled one-dimensional (1-D) differen- 
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tial equations, which are, for example, in mixture fraction space for non-premixed 
combustion. Challenges include how to tabulate the resulting flamelets efficiently and 
how to distribute the multiple flamelets across the domain for multiple representa- 
tive interactive flamelet (MRIF) approaches (Peters 1986; Banerjee and Ierapetritou 
2006; Ihme et al. 2009; Bode et al. 2019b). 

Data-driven methods, such as machine learning (ML) and deep learning (DL), 
have gained a massive boost across almost all scientific domains, ranging from speech 
recognition (Hinton et al. 2012) and learning optimal complex control (Vinyals et al. 
2019) to accelerating drug development (Bhati et al. 2021). Important steps towards 
the wider usage of ML/DL methods were the availability of more and larger (labeled) 
datasets as well as significant improvements with respect to graphics processing units 
(GPUs), which enabled high-speed GPUs and efficient execution of ML/DL oper- 
ations on GPUs. One particular class of ML/DL is AI super-resolution, also called 
single image super-resolution (SISR), originally developed by the computer science 
community for increasing the resolution of 2-D images (i.e., to super-resolve images) 
beyond classical techniques, such as bicubic interpolation. The idea is that complex 
networks can extract and learn features during training with many images and are 
then able to add this information to images based on local information. Dong et 
al. (2014) introduced a super-resolution convolutional neural network (SRCNN), 
a deep convolutional neural network (CNN) which directly learns the end-to-end 
mapping between low and high resolution images. Several other works continuously 
improved this approach (Dong et al. 2015; Kim et al. 2016a,b; Lai et al. 2017; 
Simonyan and Zisserman 2014; Johnson et al. 2016; Tai et al. 2017; Zhang et al. 
2018) to achieve better prediction accuracy by correcting multiple shortcomings of 
the original SRCNN. The switch from CNNs to generative adversarial networks 
(GANs) (Goodfellow et al. 2014), as proposed by Ledig et al. (2017), finally resulted 
in the development of enhanced super-resolution GANs (ESRGANs) by Wang 
et al. (2018). 

The idea of AI super-resolution has been also successfully adopted for simulations 
of physical phenomena, from climate research (Stengel et al. 2020) to cosmology (Li 
et al. 2021). While many applications focus on super-resolving single time steps of 
simulations, Bode et al. (2019a, 2021, 2022), Bode (Bode 2022a, b, c) introduced an 
algorithm for employing AI super-resolution as a subfilter model for (reactive) LES. 
They developed the physics-informed enhanced super-resolution GAN (PIESRGAN) 
and demonstrated its application for various turbulent inert and reactive flows. To 
successfully use AI super-resolution to time-advance complex flows, accurate a priori 
results are necessary but not sufficient. Only if the model also gives good a posteriori 
results, i.e., when it is continuously used as model for multiple consecutive time 
steps during a simulation, it is promising for applying it to complex flows. Typically, 
good a posteriori results are much more difficult to achieve, as errors accumulate 
over time, especially if low-dissipation solvers are used. Consequently, a posteriori 
results are presented for all cases discussed in this article. 

This work summarizes important modeling aspects of PIESRGAN in the next 
section. Afterward, its application to a decaying turbulence case, reactive spray 
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setups, premixed combustion, and non-premixed combustion is described. This 
chapter finishes with conclusions for further developments of the AI super-resolution 
approach in general and the PIESRGAN in particular. 


2 PIESRGAN 


This section summarizes the PIESRGAN and explains the PIESRGAN-subfilter 
modeling approach. Details about the architecture, the time advancement algorithm, 
and implementation details are given. Note that the PIESRGAN modeling approach 
presented in this work follows a hybrid approach. AI super-resolution is only used on 
the smallest scales to reconstruct the subfilter contributions, while the well-known 
filtered equations for LES are used to advance the flow in time, i.e., the time inte- 
gration is not integrated in the network. This approach is technically more complex 
than integrating the time integration in the network. However, it is also expected to 
be more general and universal. Turbulence is known to feature some universality on 
the smallest scales (Frisch and Kolmogorov 1995), which should be learnt by the 
network and be universal for many applications. The larger scales, which can be 
strongly affected by the geometry and setup and thus are fully case dependent, are 
considered by the filtered equations making PIESRGAN-subfilter models applicable 
for multiple cases. 


2.1 Architecture 


PIESRGAN is a GAN model, which is a generative model that aims to estimate the 
unknown probability density of observed data without an explicitly provided data 
likelihood function, i.e., with unsupervised learning. Technically, a GAN has two 
networks. The generator network is used for modeling and creates new modeled data. 
The discriminator network tries to distinguish whether data are generator-created or 
real data and provides feedback to the generator network. Thus, throughout the 
learning process, the generator gets better at creating data as close as possible to real 
data, and the discriminator learns to better identify fake data, which can be seen as 
two players carrying out a minimax zero-sum game to estimate the unknown data 
probability distribution. 

The network architecture and training process are sketched in Fig. 1. Fully resolved 
3-dimensional (3-D) data (“H”) are filtered to get filtered data (“F”). The filtered 
data is used as input to the generator for creating the reconstructed data (“R”). The 
accuracy of the reconstructed data is evaluated by means of the fully resolved data. 
The discriminator tries to distinguish between reconstructed and fully resolved data. 
The accuracy is measured by means of the loss function, which reads 


L= By L adversarial T Bo Lpixel + b3 L gradient T Ba L physics ’ ( 1 ) 
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Fig. 1 Sketch of PIESRGAN. “H” denotes high-fidelity data, “F” are corresponding filtered data, 
and “R” are the reconstructed data. The components are: Conv3D—3-D Convolutional Layer, 
LeakyReLU—Activation Function, DB—Dense Block, RDB—Residual Dense Block, RRDB— 
Residual in Residual Dense Block, Brsr—Residual Scaling Factor, BN—Batch Normalization, 
Dense—Fully Connected Layer, Dropout—Regularization Component, Bdropour—Dropout Factor. 
Color-modified image from Bode et al. (2021) 


where f, to 64 are coefficients weighting the different loss term contributions with 
>>; Bi = 1. The adversarial loss is the discriminator/generator relativistic adversarial 
loss (Jolicoeur-Martineau 2018), which measures both how well the generator is able 
to create accurate reconstructed data compared to the fully resolved data and how 
well the discriminator is able to identify fake data. The pixel loss and the gradient 
loss are defined using the mean-squared error (MSE) of the quantity and its gradient, 
respectively. The physics loss enforces physically motivated conditions, such as the 
conservation of mass, species, and elements, depending on the underlying physics 
of the problem. For the non-premixed temporal jet application in this work, it reads 


L physics = Bai L mass F Ba2L species Bg Bag Letements; (2) 


where 41, 642, and B43 are coefficients weighting the different physical loss term 
contributions with `, B4; = 1. The physically motivated loss term is very important 
for the application of PIESRGAN to flow problems. If the conservation laws are 
not fulfilled very well, the simulations tend to blow up rapidly, which is an impor- 
tant difference to super-resolution in the context of images. Errors which might be 
acceptable there can be easily too large for usage as a subfilter model (Bode et al. 
2021). 

The generator heavily uses 3-D CNN layers (Conv3D) (Krizhevsky et al. 2012) 
combined with leaky rectified linear unit (LeakyReLU) layers for activation (Maas 
et al. 2013). The residual in residual dense block (RRDB), which was introduced for 
ESRGAN,, is essential for the performance of the state-of-the-art super-resolution. 
It replaces the residual block (RB) employed in previous architectures and con- 
tains fundamental architectural elements such as residual dense blocks (RDBs) with 
skip-connections. A residual scaling factor rsr helps to avoid instabilities in the 
forward and backward propagation. RDBs use dense connections inside. The output 
from each layer within the dense block (DB) is sent to all the following layers. The 
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discriminator network is simpler. It inherits basic CNN layers (Conv3D) combined 
with LeakyReLU layers for activation with and without batch normalization (BN). 
The final layers contain a fully connected layer with LeakyReLU and dropout with 
dropout factor Baropout- A Summary of all hyperparameters is given in Table 1. 


Table 1 Overview of the PIESRGAN hyperparameters. The given ranges represent the sensitivity 
intervals with acceptable network results. The central values were used for the decaying turbulent 
case in this work 


Bi [0.2 x 1075, 0.6 x 1074, 0.8 x 1074] 
Bo [0.79327, 0.88994, 0.91812] 
b3 [0.04, 0.06, 0.15] 
Ba [0.01, 0.05, 0.06] 
BrsF [0.1, 0.2, 0.3] 
Baropout [0.2, 0.4, 0.5] 


I generator [1.2 x 1076, 4.5 x 106, 5.0 x 10-6] 
ldiscriminator [4.4 X 1076, 4.5 x 1076, 8.5 x 1076] 


2.2 Algorithm 


The LES equations, which are Favre-filtered, are used to advance a PIESRGAN- 
LES in time. As consequence of the filter operation to the equations, unclosed 
terms appear, which require information from below the filter width to be evaluated. 
The LES subfilter algorithm aims to reconstruct this information to close the LES 
equations. This is done during every time step. For the cases with chemistry, the 
chemistry can be included in the PIESRGAN during the training process (Bode et al. 
2022; Bode 2022a). As chemistry is often active locally, this can be also used to save 
computing time by adaptively solving only in relevant regions. The algorithm starts 
with the LES solution ®f at time step n, which includes the entirety of all relevant 
fields in the simulation, and consists of repeating the following steps: 


1. Use the PIESRGAN to reconstruct ®% from Pf gs- 
2. (Only for nonuniversal quantities) Use ®, to update the scalar fields of ® to 


pr" by solving the unfiltered scalar equations on the mesh of ©. 


3. Use oo to estimate the unclosed terms Ygs in the LES equations of ® 
for all fields by evaluating the local terms with ©?" and applying a filter 
operator. 


4. Use Ui’, and fgs to advance the LES equations of ® to Prid. 
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2.3 Implementation Details 


PIESRGAN was implemented using a TensorFlow/Keras framework (Abadi et al. 
2016; Keras 2019) in this work to efficiently employ GPUs. For all the examples 
discussed here, the data were split into training and testing sets to avoid reproduction 
of fully seen data. During the training and querying processes, it was found that 
consistent normalization of quantities is very important for highly accurate results 
(Bode et al. 2021). Furthermore, both operations are done based on subboxes, since 
reconstructing bigger boxes can become very memory intensive. Typically, each 
subbox is chosen large enough to cover the relevant physical scales (Bode et al. 2021). 
The filter width can become problematic if non-uniform meshes are employed. In 
these cases, training with multiple filter widths is suggested to achieve good accuracy 
throughout the entire domain (Bode 2022a). 

The potential extrapolation capability of data-driven methods is always challeng- 
ing. Many trained networks only work well in regions which were accessible during 
the training process. This can become very problematic for flow applications, where 
often data at low Reynolds numbers is abundant, while data at high Reynolds num- 
bers is not computable at all, making transfer learning difficult. To deal with this 
problem, concepts such as a two-step training approaches (Bode et al. 2021) can be 
used relying on the further prediction width of GANs compared to single networks 
(Bode et al. 2022; Bode 2022a). In order to avoid this open question of extrapolation 
capabilities, only interpolation cases are presented in this work. 

A basic version of PIESRGAN is available on GitLab (https://git.rwth-aachen. 
de/Mathis.Bode/PIESRGAN. git) for an interested reader. 


3 Application to Turbulence 


The application of PIESRGAN to non-reactive turbulence is a good starting point. 
Besides closing the filtered momentum equations, the evaluation of passive scalars 
is a key challenge toward applying PIESRGAN to turbulent reactive flows, as scalar 
mixing is especially important for non-premixed combustion cases. Furthermore, 
turbulence is assumed to be universal on the smallest scales that makes it reasonable 
to accurately learn the subfilter behavior by a complex network. 


3.1 Case Description 


A decaying turbulence case with a peak wavenumber «xp of 15 m7! and a maximum 
Taylor microscale-based Reynolds number Re, of about 88 is used as turbulent 
example case here. Turbulence with an initial turbulence intensity of ug = 2(k)/3 
with (k) as ensemble-averaged turbulent kinetic energy was initialized on a uniform 
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mesh with 4096° and solved along with passive scalars. The original direct numerical 
simulation (DNS) was computed with the solver psOpen (Gauding et al. 2019). 
psOpen employs the P3DFFT library for spatial decomposition and to perform the 
fast Fourier transform (FFT) (Pekurovsky 2012) of the incompressible Navier-Stokes 
equations formulated in spectral space, but with the non-linear term computed in 
physical space. Over time, the turbulent intensity decays, i.e., the Reynolds number 
decreases, resulting in larger turbulent structures. This makes the decaying turbulence 
case a very good baseline application, as many practical applications also features 
varying Reynolds numbers. 

The corresponding PIESRGAN-LES was computed with CIAO, an arbitrary order 
finite-difference code (Desjardins et al. 2008). The physics-informed loss function 
only considered a condition for enforcing mass conservation. Further details can be 
found in Bode et al. (2021). 


3.2 A Priori Results 


For evaluating the accuracy of PIESRGAN, Fig.2 shows 2-D slices of the fully 
resolved velocity and scalar fields, the filtered fields, and the reconstructed fields 
employing PIESRGAN. The visual agreement is good, and the network seems to 
be able to add sufficient information to the filtered fields to reconstruct the fully 
resolved data. Bode et al. (2021) pointed out that high accuracy can also be achieved 
in scenarios in which PIESRGAN needs to “extrapolate” training data using a two- 
step training approach. The two-step training approach combines fully resolved data 
for updating generator and discriminator and underresolved training data, which 
further update the generator. This is an important feature of the employed GAN 
approach as many practical use cases feature Reynolds numbers which cannot be 
computed with DNS. 

In addition to the visual assessment of the PIESRGAN, Fig.3 shows the dimen- 
sionless spectra for the velocity vector field and the passive scalar, denoted as .7. 
The spectra are computed with the fully resolved fields, the filtered fields, and the 
reconstructed fields and are an important measurement for the prediction quality of 
PIESRGAN, as they quantify the distribution of turbulent energy and scalar among 
the length scales. The filter operation removes the smallest scales, and the task of 
the PIESRGAN model is to add the smallest scales to reconstruct the fully resolved 
distribution. The agreement is good for both spectra, however, not perfect for very 
high wavenumbers, i.e., for «/«, * 80. It is important to note that the numerics have 
a significant impact on the results in Fig. 3. Only high order and consistent numerics 
avoid significant noise for high wavenumbers in the reconstructed data. 
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Fig. 2 Visualization of 2-D slices of the dimensionless passive scalar z* and the dimensionless 
velocity component u* for the time step with Taylor microscale-based Reynolds number of about 
88. Colormaps span from blue (minimum) to red (maximum) (Bode et al. 2021) 
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Fig. 3 Dimensionless spectra .7* plotted over the normalized wavenumber «/«p and evaluated 
on DNS data, filtered data, and reconstructed data for the dimensionless velocity vector u* and 
passive scalar z* for the time step with Reynolds number of about 88. Note that the symbols do not 
represent the discretization but are only used to distinguish the different cases. Modified plot from 
Bode et al. (2021) 
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Fig. 4 Evolution over dimensionless time t* of the ensemble-averaged dimensionless turbulent 
kinetic energy (k*) and ensemble-averaged dimensionless dissipation rate (e*). Plot from Bode 
et al. (2021) 


3.3 A Posteriori Results 


A PIESRGAN-LES must accurately predict the decay of turbulence, usually mea- 
sured by means of the ensemble-averaged turbulent kinetic energy and the ensemble- 
averaged dissipation rate, denoted as (£). A uniform LES mesh of 64° was considered 
and the results are presented in Fig. 4. The prediction accuracy of PIESRGAN-LES is 
high. The results for a heavily underresolved simulation without LES model show that 
especially the ensemble-averaged dissipation rate is strongly underpredicted without 
model. This makes sense as the dissipation rate acts on the smallest scales which 
simply do not exist in the underresolved simulation due to the lack of resolution. 


3.4 Discussion 


The presented a posteriori results are remarkable as the trained network is able to 
reproduce the decay on a multiple orders of magnitude coarser mesh. One reason 
for this could be the universal character of turbulence on the smallest scales. From 
a computational point, a too drastic reduction of mesh size might not result in the 
fastest time-to-solution as the costs of subbox reconstruction increase with the recon- 
struction size. Thus, a finer LES mesh with smaller subbox reconstruction can be 
faster as demonstrated by the two turbulent combustion cases below. Furthermore, if 
the network is used as part of a multi-physics simulation, often LES meshes which 
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are only 10-20 times coarser per direction than a DNS, which fully resolves the 
turbulence, are needed to accurately consider boundary conditions and other phys- 
ical phenomena. In this context, it is also interesting to mention the effect of the 
Courant-Friedrichs-Lewy (CFL) number. Theoretically, coarser LES meshes also 
enable larger time steps. However, it was found that usually a time step size between 
the DNS and theoretical LES time step sizes is needed to accurately reproduce the 
DNS results. The reason might be that the CFL number is a numerical limit, however, 
the PIESRGAN-LES also needs to fulfil some intrinsic physical time step limitations. 

Overall, PIESRGAN has many advantages for turbulent flows. It can not only be 
used to reduce the computing and storing cost but also to enable new workflows. 
For example, smaller domains can be computed first to get accurate training data. 
Afterward, the trained model is applied to a larger domain to achieve converged 
statistics. In addition to the discussed LES application, it could also be used as cheap 
turbulence generator for complex simulations. 


4 Application to Reactive Sprays 


Reactive sprays occur in many applications, such as diesel engines. Usually, the 
liquid fuel is injected into a combustion chamber where it finally burns. Before igni- 
tion can take place, multiple physical processes happen. The continuous liquid fuel 
phase splits into smaller ligaments and small droplets. These disperse droplets start 
evaporating and the resulting vapor mixes with the ambient gas forming a reactive 
mixture in which the combustion process occurs. The more these stages are spatially 
separated, the more similar the final combustion process becomes to classical non- 
premixed combustion. A measurement for this separation is the difference between 
lift-off length (LOL), i.e., the distance between nozzle tip and closest combustion 
events, and the liquid penetration length (LPL), i.e., the distance between nozzle tip 
and roughly furthest fuel in liquid phase. This work focuses on the Spray A and 
Spray C cases defined by the Engine Combustion Network (ECN) (2019). 


4.1 Case Description 


Spray A and Spray C are both single hole nozzles, however, while Spray A is designed 
to avoid cavitation, Spray C features cavitation. Additionally, Spray A has a smaller 
exit diameter like injectors used for diesel engines, while the exit diameter of Spray 
C is larger as for heavy-duty injectors. Both injectors were investigated with n- 
dodecane as fuel at standard reactive conditions, reading 150 MPa injection pressure, 
22.8kg/m* ambient density, 15% ambient oxygen concentration, 900K ambient 
temperature, and 363 K fuel temperature. Furthermore, inert conditions, i.e., without 
ambient oxygen, were run for Spray A, while Spray C was also simulated with 1000, 
1100, and 1200 K ambient temperatures. The cases are denoted as SA900, SC900, 
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SC1000, SC1100, and SC1200 based on the used nozzle geometry and ambient 
temperature. Inert conditions are separately emphasized. 

The cases were computed using CIAO with a similar setup as described by Goeb 
et al. (2021). More precisely, the initial droplets were generated based on a pre- 
computed droplet size distribution for the Spray A case (Bode et al. 2014, 2015). 
For the Spray C case, a blob method utilizing the effective liquid diameter at the 
nozzle exit was employed. Breakup and evaporation were modeled with Kelvin- 
Helmholtz/Rayleigh-Taylor (KH/RT) (Patterson and Reitz 1998) and Bellan’s evap- 
oration approach (Miller and Bellan 1999) for both cases. Velocity and mixing LES 
closure were based on PIESRGAN-subfilter modeling. Note that due to the lack 
of reactive spray DNS data and motivated by the separation of phenomena within 
the combustion process of sprays, the PIESRGAN was trained with the decaying 
turbulence data introduced in the previous sections. 

The reaction mechanism by Yao et al. (2017) was used for all simulations. An 
MRIF approach was employed for chemistry modeling, which is also summarized 
in Fig.5. The non-premixed flamelet approach assumes that chemistry and flow are 
only loosely coupled through the scalar dissipation rate. Consequently, two differ- 
ent sets of equations are solved in MRIF approaches. The first set are the usual 
flow equations solved in 3-D spatial space. The second set describes chemistry in 
the mixture fraction space Z which is only 1-D, and is called flamelet equations. 
Therefore, representing and solving the chemistry by means of the flamelet equa- 
tions is much cheaper compared to solving the chemistry in full 3-D spatial space. 
As shown by the equations in Fig. 5, the mapping towards the flamelet space is done 
by weighted volume-averages, while the mapping back to physical space employs 


1-D Flamelet Solver: Solve ny flamelets 


+ 


Yai(Z) Xi(Z) f 


¥,(@) = f B(Z; Z(#), Z(E) (= Faal) To) aZ 


Y, (2) 


Ltt 
Ry 
| 
Ri 


L 3-D CFD Solver: Advance flow equations in time 


Fig. 5 Schematic representation of the MRIF approach and its coupling to 3-D computational 
fluid dynamics (CFD) solver. Tilde denotes Favre-filtered data. The overbar indicates Reynolds- 
averaging. The hat labels quantities in mixture fraction space. Z is the mixture fraction, W; the 
flamelet weights, p the pressure, x the scalar dissipation rate, p the density, Y, the mass fractions, 
e the internal energy, and T the temperature. 6 denotes the presumed B-PDF, and f indicates 
the functional form of the scalar dissipation rate. The spatial coordinates are represented by x, 
and integration over the volume of the full domain is described by f dV. All variables are time 
dependent, but ¢ is omitted here for brevity. Image from Bode (2022c) 
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probability density functions (PDFs), typically constructed by means of the filtered 
mixture fraction and mixture fraction variance. 

Thus, the MRIF approach typically requires a presumed functional form of the 
scalar dissipation rate in mixture fraction space f and the PDF of the mixture fac- 
tion. For the functional form, often a presumed log-based profile is assumed (Pitsch 
et al. 1998), while a beta-PDF is often employed for the mixture fraction PDF. Both 
quantities are critical for LES, as they often have significant subfilter contributions. 
In the context of PIESRGAN modeling, both assumptions can be avoided by directly 
evaluating both profiles on the reconstructed fields which can improve the predic- 
tion results of the simulations. For the Spray C cases, the mixture fraction PDF was 
indeed evaluated based on the reconstructed data for the results presented here (Bode 
2022b). 


4.2 Results 


The lack of DNS data makes a distinction between a priori and a posteriori results 
difficult. Instead LES results are compared with experimental data here (Engine 
Combustion Network 2019). Figures 6 and 7 compare the ignition delay time t; and 
the LOL lor for the considered spray cases. All simulations slightly underpredict the 
experimental results. This could be because of the chemical kinetics mechanism used 
which has a significant impact on the ignition delay time. Furthermore, the ignition 
delay time and consecutively LOL decrease with increasing ambient temperature. 
These trends are correctly predicted for Spray C by the PIESRGAN-LESs. 
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Fig. 6 Ignition delay time ti for Spray A and Spray C cases 
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Fig. 7 LOL LoL for Spray A and Spray C cases 


The near nozzle experimental data for the inert Spray A case allow a further 
evaluation of PIESRGAN-LES compared to classical LES with dynamic Smagorin- 
sky (DS) model. Figure 8 compares the temporally and circumferentially averaged 
fuel mass fraction for an underresolved simulation without model, a DS-LES, 
and a PIESRGAN-LES with experimental data. The agreement is best between 
PIESRGAN-LES and experimental data. Note that a similar resolution is chosen 
for DS-LES and PIESRGAN-LES here. It seems that the PIESRGAN-LES is more 
robust with respect to coarser resolutions. If a finer resolution were to be used, the 
results for PIESRGAN-LES and DS-LES would become more similar. 


4.3 Discussion 


The reactive spray cases computed with PIESRGAN-subfilter model show that the 
PIESRGAN-based subfilter approach can be used to actually compute complex flows 
with high accuracy. In terms of operations needed per time step, the PIESRGAN- 
subfilter model is more expensive than a classical DS approach. Furthermore, the 
PIESRGAN approach generates additional cost for training of the network. However, 
the PIESRGAN approach has the advantage of naturally running on GPUs which 
are responsible for the majority of floating point operations per second (FLOPS) in 
current supercomputer systems. 

As discussed, the PIESRGAN approach can be used to reduce model assump- 
tions, such as those made for the mixture fraction PDF and functional form of the 
scalar dissipation rate, which is an advantage. The presented results demonstrate that 


292 M. Bode 


0.20 T T T | ; i 
— Experiment 
augas Underresolved 
i -— - PIESRGANR-LES 
0.15 ‘ — DS-LES 7 
E 0.10 
5 
0.05 
0.00 


r [mm] 


Fig. 8 Temporally and circumferentially averaged fuel mass fraction Ciel) evaluated 18.75 mm 
downstream from the nozzle and plotted against the radial distance from the spray axis r. Plot from 
Bode et al. (2021) 


simulations without the discussed presumed closures but with PIESRGAN closure 
are able to reasonably match experimental data. However, due to the lack of DNS 
data and the multiple models which are still involved, such as breakup models and the 
chemical mechanism, a detailed analysis of the impact of these closures on macro- 
scopic quantities, such as LOL and ignition delay time, remains difficult. However, 
it can be concluded that the PIESRGAN approach is very robust even in heavily 
underresolved flow situations. This is an important feature for very complex simu- 
lations such as full engine simulations. In these cases, it is impossible to sufficiently 
resolve all parts and the robustness of closure models becomes significant. 


5 Application to Premixed Combustion 


In premixed combustion cases, fuel and oxidizer are completely mixed before com- 
bustion is allowed to take place. Typical examples include spark ignition engines and 
lean-burn gas turbines. Therefore, in contrast to non-premixed combustion, correctly 
predicting fuel-oxidizer mixing is less important for premixed combustion. 
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5.1 Case Description 


Falkenstein et al. (2020a, b,c) computed a collection of premixed flame kernels with 
iso-octane/air mixtures under real engine conditions and with unity and constant 
Lewis numbers. The case with unity Lewis number, i.e., featuring the same diffu- 
sion coefficient for all scalar species, is used as demonstration case in this work. 
All simulations, DNS and PIESRGAN-LES, were computed with CIAO (Desjardins 
et al. 2008). The DNS relies on the low-Mach number limit of the Navier-Stokes 
equations employing the Curtiss—Hirschfelder approximation (Hirschfelder et al. 
1964) for diffusive scalar transport and including the Soret effect. A mesh with 960° 
cells was used. The iso-octane reaction mechanism features 26 species (Falkenstein 
et al. 2020a). The setup puts one flame kernel in a homogeneous isotropic turbu- 
lence field. Consequently, the turbulence decays over time, while the flame kernel 
expands, wrinkles, and deforms from its originally spherical shape. As the resulting 
flame speed depends on the local curvature of the flame kernel, it is very important 
to accurately predict the flame surface density. For running PIESRGAN-LES, the 
training of PIESRGAN was performed with multiple filter stencil widths varying 
from 5 to 15 cells (Bode et al. 2022). 

Often, a reaction progress variable is defined to describe the temporal state of a 
flame kernel. Falkenstein et al. (2020a) defined it as sum of the mass fractions of H2, 
H20, CO, and CO, and introduced a simplified reaction progress variable ¢. The 
simplified reaction progress variable behaves according to a transport equation with 
the thermal diffusion coefficient as diffusion coefficient reading 


ðpt dpujs ð 0g ; 
= D l 3 
a ae Oa a T 


employing Einstein’s summation notation, with p as fluid density, ¢ as time, uj as 
velocity vector, x; as space vector, D as thermal diffusion coefficient, and w,; as 
chemical source term of the simplified reaction progress variable, which is the sum 
of the source terms of the species used for the definition of the reaction progress 
variable. The evolution of one flame kernel realization is visualized in Fig. 9. 

In contrast to the decaying turbulence and reactive spray cases presented in the 
previous sections, it is not sufficient to only train the PIESRGAN with turbulence data 
for finite-rate chemistry cases. Instead, the fully trained network based on decaying 
homogeneous isotropic turbulence was only used as starting network, which was 
further updated with finite-rate chemistry data. As a consequence, reconstruction is 
learnt for all species fields, and the optional solution step with the unfiltered transport 
equations on the finer mesh of the reconstructed data is employed. This combination 
of reconstructing and solving was found to be crucial for the accuracy of finite-rate 
chemistry flows (Bode et al. 2022; Bode 2022a). 
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Fig. 9 (Continued) 
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<Fig. 9 (Continued) Visualization of 2-D slices of the simplified reaction progress variable ¢, the 
source term of the simplified reaction progress variable wz, and the velocity component U (left to 
right) for five different increasing time steps (top to bottom) for the fully turbulent flame kernel with 
unity Lewis number. The first time shows 6.0 x 1075 s, and the time increment is 7.5 x 1075 s. 
The final time is 3.6e-4 s, which is also used for the a priori analysis in Fig. 10. Colormaps span 
from blue (minimum) to green to yellow (maximum). Note that the flame kernel does not break into 
parts at the latest time shown. A coherent flame kernel topology was maintained at all times 


5.2 A Priori Results 


Reconstruction results for the simplified reaction progress variable, two species mass 
fractions, and one velocity component are compared with fully resolved and filtered 
fields in Fig. 10. The agreement between fully resolved fields and reconstructed fields 
is good. The filtered data, which were filtered over 15 cells, are less sharp due to the 
smoothing of small-scale structures. 


5.3 A Posteriori Results 


Multiple quantities can be tracked during the evolution of the flame kernel. The flame 
surface density & can be evaluated by means of a phase indicator function T (x, t), 
defined for a reaction variable progress variable threshold value f as F(x, t) = 
H (C(x, t) — f), with H being the Heaviside step function. The surface density is 
then given by 

x= (|VI), (4) 


employing volume-averaging. Moreover, the corresponding characteristic length 
scale Ly can be defined as 


4T) a= 
poo ae), 


(5) 

As for the decaying turbulence case before, the averaged turbulent kinetic energy 
decays. In contrast to this, the flame surface density is expected to increase signifi- 
cantly and the characteristic length scale Ly should increase slightly. This is shown 
in Fig. 11. The agreement between DNS and PIESRGAN-LES results is good. 


5.4 Discussion 


The accuracy of PIESRGAN for premixed combustion cases is very promising. This 
enables PIESRGAN-LES to be a very useful tool for evaluation of cycle-to-cycle 
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Fig. 10 Visualization of DNS, filtered, and reconstructed fields for the unity Lewis number case 
employing PIESRGAN. Results for the simplified reaction progress variable ¢, the CgHig mass 
fraction Ycguig, the OH mass fraction You, and the velocity component U are shown. Colormaps 
span from blue (minimum) to green to yellow (maximum). Note that the images are zoomed in 
compared to the images presented in the last row in Fig.9 
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variations (CCVs) and other complex phenomena in engines. A potential workflow 
could first compute two DNS realizations and other complex phenomena of pre- 
mixed flame kernels, which are used for on-the-fly training of the PIESRGAN. The 
trained network is then used to compute multiple PIESRGAN-LES realizations of 
the premixed flame kernel setup and enable sufficient statistics to study CCVs. Bode 
et al. (2022a) also showed a certain robustness of the PIESRGAN-subfilter model 
with respect to setup variations, which might be partly a result of the GAN approach. 
Consequently, PIESRGAN could also be employed to optimize geometries of tur- 
bines or devise optimal operating conditions to reduce harmful emissions. 

As discussed in the context of reactive sprays, the reconstruction approach could 
also be used to improve conventional models, typically relying on filtered probability 
functions. Instead, a PIESRGAN approach allows to directly evaluate the filtered 
density function (FDF) increasing the model accuracy. 


6 Application to Non-premixed Combustion 


In non-premixed combustion cases, fuel and oxidizer are initially separated. As a 
consequence, mixing and continuous interdiffusion is necessary to establish a flame. 
Typical examples are furnaces, diesel engines, and jet engines. 


6.1 Case Description 


The study of non-premixed temporally evolving planar jets (Denker et al. 2020, 
2021) was also performed with the CIAO code (Desjardins et al. 2008) and featured 
multiple nonreactive and reactive cases with a highest initial jet Reynolds number 
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Fig. 12 Visualization of the 
turbulent non-premixed 
temporal jet at a late time 
step. The fuel is in the center, 
two flames burn upwards and 
downwards, respectively, and 
the main flow direction is 
from the left to the right. 
Upper half: Mixture fraction 
Z on a linear scale. 
Colormap spans from black 
(minimum) to red 
(maximum). Lower half: 
Scalar dissipation rate x on a 
logarithmic scale. Colormap 
spans from black (minimum) 
to red (yellow) 


of 9850. It used methane as fuel, modeled by a reaction mechanism with 28 species. 
The largest case used 1280 x 960 x 960 cells and is visualized in Fig. 12 by means 
of the mixture fraction Z and its scalar dissipation rate defined as 


vow 
x =2D (=) (6) 
Ox; 


with D as diffusivity, x; as spatial coordinate, and utilizing Einstein’s summation 
notation. The temporal jet setup has two periodic directions: the flow direction (from 
left to right) and the spanwise direction (perpendicular to the cut view in Fig. 12). The 
moving layer of fuel is in the center and surrounded by originally quiescent air. At 
the late time step shown, the central fuel stream has already experienced significant 
bending due to turbulence, resulting in the lack of fuel in the upper half at about 
one quarter length of the domain. Furthermore, it can be seen that the layer in which 
scalar dissipation is active is broader than the fuel layer and the scalar dissipation 
rate structures are much finer than the mixture fraction structures resulting from the 
derivative. Only one realization per parameter combination was computed, however, 
the spanwise direction was chosen in such a way that turbulent statistics evaluated in 
the two periodic directions converged. The nonperiodic direction was chosen large 
enough to prevent interaction of the jet with the boundary. As for the premixed case, 
a PIESRGAN with learnt chemistry was employed for the results presented here. 
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6.2 A Priori Results 


The scalar dissipation rate, i.e., the measurement of local mixing, is very essential 
for non-premixed combustion as it requires the fuel and oxidizer streams to be mixed 
first, resulting in a lower limit for the scalar dissipation rate required for burning. 
As indicated by Fig. 12, the scalar dissipation rate is a quantity which acts on the 
smallest scales making it difficult for LES as it usually has significant contributions 
below the filter width. Furthermore, extinction (and later reignition) can occur in 
regions where the scalar dissipation rate becomes too large, typically estimated by the 
quenching scalar dissipation rate in so-called stationary flamelet solutions, denoted 
as Xq. Overall, the scalar dissipation rate is a very well suited quantity to evaluate the 
prediction accuracy of the PIESRGAN-model. The PDF # of the scalar dissipation 
rate is shown in Fig. 13. As expected, the filtering leads to a lack of regions with very 
high scalar dissipation rate. These missing values are successfully reconstructed 
by the PIESRGAN-model via the mass fraction fields, i.e., the scalar dissipation 
rate shown in the figure is a post-processed quantity relying on other reconstructed 
quantities of the simulation data. The result in the log-log plot looks very good, 
however, note that the increase of probability (from about x = 0.1 to 1 s~!) is much 
better predicted with the reconstructed data than with filtered data alone, but far from 
perfect. 


6.3 A Posteriori Results 


Typically, a non-premixed flame is located on surfaces of roughly stoichiometric 
mixture fraction, which makes the scalar dissipation rate conditioned on the stoi- 
chiometric mixture fraction an interesting quantity. Furthermore, a dimensionless 
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time is introduced, denoted as t*. This time is shifted to make different cases com- 
parable with the starting point defined as the time when the variance of the scalar 
dissipation rate at stoichiometric conditions is zero. The normalization is done with 
the jet time defined with the jet height and its bulk velocity as 32.3 mm/20.7 m/s. 
The time evolution of the ensemble-averaged density-weighted scalar dissipation 
rate conditioned on the stoichiometric mixture fraction is compared between DNS 
and PIESRGAN-LES in Fig. 14. The LES used training data of varying filter widths 
with stencil sizes of 7—15 cells per direction (Bode 2022a). The prediction of the 
LES is very good even though the peak is slightly underpredicted. 


6.4 Discussion 


The non-premixed case emphasizes two important points with respect to PIESRGAN 
modeling. First, as seen for the decaying turbulence case, the accuracy for predict- 
ing mixing is very high. This is crucial for many applications going far beyond 
combustion cases. Second, PIESRGAN is able to statistically predict a local phe- 
nomenon like quenching, which is very challenging for classical LES models. Both 
points make PIESRGAN very promising for predictive LES of even more complex 
configurations. 

The non-premixed case with more than one billion grid points and 28 species, 
chosen as an example in this section, also highlights the capability of PIESRGAN 
to be used for recomputing the largest available reactive DNS. This is technically 
remarkable and only possible due to the rapid developments in the fields of ML/DL 
and supercomputers in general. 
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7 Conclusions 


AI super-resolution is a powerful tool to improve various aspects of state-of-the- 
art simulations. These include the reduction of storage and input/output (I/O), a 
better comparability between experimental and simulation data, and highly accu- 
rate subfilter models for LES, as demonstrated by the examples discussed in this 
work. The remarkable progress in the fields of ML/DL and supercomputing in gen- 
eral, especially with respect to GPU computing, has made ML/DL-based techniques 
competitive and in some aspects even superior compared to classical approaches, and 
it is expected that the rapid developments in this field will continue in the upcoming 
years. 

The presented applications ranging from turbulence to non-premixed combustion 
focused on the high accuracy of PIESRGAN-based approaches in a priori and a 
posteriori tests. Especially, the a posteriori accuracy is striking unveiling the potential 
of the PIESRGAN-subfilter approach. Compared to classical methods, the LES mesh 
can be often significantly reduced as the PIESRGAN technique was found to be more 
robust in underresolved flow situations. 

From a technical point of view, PIESRGAN-based models are simple to use as they 
can be easily implemented in frameworks, such as Keras/TensorFlow and PyTorch, 
which are used by a very large community. The trained network can be coupled to 
any simulation code by just adapting the existing application programming interface 
(API) to external libraries. 

PIESRGAN-based subfilter modeling is a relatively new technique and thus many 
questions are still open. The presented architecture resulted in good results but it is 
expected that it could be further improved. The approach of physics-informed loss 
function compared to physics-informed network layers seems to be reasonable and 
has the advantage of a trivial implementation while resulting in equally accurate pre- 
dictions. One of the most important topics in the context of data-driven approaches is 
the extrapolation capability, i.e., how accurate are predictions outside of the training 
range. The recent publications (Bode et al. 2019a, 2021, 2022; Bode 2022a, b,c) 
show some promising properties in this regard for PIESRGAN, but it should be 
investigated in more detail in the future. Additionally, the combustion community 
has computed petabytes of DNS data for various combustion configuration. Given 
the demonstrated generality of PIESRGAN in the sense that the same architecture 
worked very well for multiple configurations, the combination of DNS database and 
PIESRGAN could be already very useful to advance combustion research. PIESR- 
GAN was also shown to be universal enough to use the same trained network for 
physical parameter variations. Thus, many optimization problems could be easily 
accelerated. 
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Abstract This chapter demonstrates three promising ways to combine machine 
learning with physics-based modelling in order to model, forecast, and avoid ther- 
moacoustic instability. The first method assimilates experimental data into candidate 
physics-based models and is demonstrated on a Rijke tube. This uses Bayesian infer- 
ence to select the most likely model. This turns qualitatively-accurate models into 
quantitatively-accurate models that can extrapolate, which can be combined pow- 
erfully with automated design. The second method assimilates experimental data 
into level set numerical simulations of a premixed bunsen flame and a bluff-body 
stabilized flame. This uses either an Ensemble Kalman filter, which requires no prior 
simulation but is slow, or a Bayesian Neural Network Ensemble, which is fast but 
requires prior simulation. This method deduces the simulations’ parameters that best 
reproduce the data and quantifies their uncertainties. The third method recognises 
precursors of thermoacoustic instability from pressure measurements. It is demon- 
strated on a turbulent bunsen flame, an industrial fuel spray nozzle, and full scale 
aeroplane engines. With this method, Bayesian Neural Network Ensembles deter- 
mine how far each system is from instability. The trained BayNNEs out-perform 
physics-based methods on a given system. This method will be useful for practical 
avoidance of thermoacoustic instability. 


1 Introduction 


At present there is no realistic alternative to combustion engines for long distance 
aircraft and rockets. These engines have unrivalled power to weight ratios and their 
fuels have unrivalled energy to weight ratios. If we continue to fly long distances 
or send rockets into space, we will continue to combust fuels in increasingly high- 
performance gas turbines and rockets. Despite decades of research and the devel- 
opment of sophisticated physics-based models, thermoacoustic instability in these 


M. P. Juniper (EX) 
Engineering Department, University of Cambridge, Cambridge CB2 1PZ, UK 
e-mail: mpj1001 @cam.ac.uk 


© The Author(s) 2023 307 
N. Swaminathan and A. Parente (eds.), Machine Learning and Its Application to Reacting 
Flows, Lecture Notes in Energy 44, https://doi.org/10.1007/978-3-031-16248-0_11 


308 M. P. Juniper 


engines remains difficult to predict and eliminate. The aim of this chapter is to intro- 
duce some promising avenues in which machine learning methods could be used to 
model, forecast, and avoid thermoacoustic instability. 


1.1 The Physical Mechanism Driving Thermoacoustic 
Instability 


The combustion chambers in aircraft and rocket engines have extraordinarily high 
power densities: from 100 MW/m? in aircraft gas turbines to 50 GW/m? in liquid- 
fuelled rocket engines (Culick 2006). They contain flames that are typically anchored 
by a recirculation zone (aircaft engines) or by fuel injector lips (rockets). Acoustic 
velocity fluctuations perturb the base of the flame, creating ripples that convect 
downstream and cause heat release rate fluctuations some time later, which in turn 
create acoustic fluctuations either directly or via entropy spots (Lieuwen 2012). If 
moments of higher (lower) heat release rate coincide sufficiently with moments of 
higher (lower) pressure around the flame, then more work is done by the heated 
gas during the expansion phase of the acoustic cycle than was done on it during the 
compression phase. If the work done by thermoacoustic driving exceeds the work 
dissipated through damping or acoustic radiation over a cycle, then the acoustic 
amplitude grows and the system is thermoacoustically unstable. This is also known 
as combustion instability. In high performance rocket and aircraft engines, the heat 
release rate is so high and the natural dissipation so low that these engines can become 
thermoacoustically unstable even if the thermodynamic efficiency of the cycle is as 
little as 0.1% (Huang and Yang 2009). 

Thermoacoustic oscillations were first noticed over 200 years ago (Higgins 1802) 
and their physical mechanism was correctly identified nearly 150 years ago (Rayleigh 
1878). They were recognized as a significant problem in rocket engines 80 years 
ago and have been investigated seriously for 70 years (Crocco and Cheng 1956). 
Nevertheless, they remain a problem for the design of gas turbine and rocket engines 
because engineers are rarely able to predict, at the design stage, whether a particular 
engine will suffer from them (Lieuwen and McManus 2003; Mongia et al. 2003). This 
chapter explains why thermoacoustic instability is so difficult to predict accurately 
and explores various data-driven approaches that could develop into alternatives or 
additions to current physics-based approaches. 


1.2 The Extreme Sensitivity of Thermoacoustic Systems 


Thermoacoustic instability is difficult to predict for two main reasons. Firstly, if the 
time lag between velocity fluctuations at the base of the flame and subsequent heat 
release rate fluctuations is similar to or greater than the acoustic period, which is 
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usually the case, then the ratio of time lag to acoustic period strongly affects the 
efficiency of the thermoacoustic mechanism (Juniper and Sujith 2018). Secondly, 
this time lag often depends on factors that are difficult to simulate or model accu- 
rately, such as jet break-up, droplet evaporation, flame kinematics, and high Reynolds 
number combustion. 

Rocket and aircraft engines are usually developed through component tests, sector 
tests, combustor tests, and full engine tests. The response of the flame to acoustic 
fluctuations, for example, might be measured in a well-characterized rig and then 
included in a model of the full engine. If, however, the flame’s behaviour were 
to change slightly when placed in the full engine then the model would contain 
unknown model error in a critical component. The model would remain qualitatively 
accurate but become quantitatively inaccurate and therefore misleading. Indeed, it 
is quite common for thermoacoustic instability to recur in the later stages of engine 
development, even though models compiled from component tests predicted it to be 
stable (Mongia et al. 2003). 

Encouragingly, this sensitivity also explains why thermoacoustic oscillations can 
usually be suppressed by making small design changes (Mongia et al. 2003; Oefelein 
and Yang 1993; Dowling and Morgans 2005). The challenge, of course, is to devise 
these small design changes from a quantitatively-accurate model rather than by trial 
and error. Adjoint methods combined with gradient-based optimization provide an 
excellent mechanism for this (Juniper and Sujith 2018; Magri and Juniper 2013; 
Juniper 2018; Aguilar and Juniper 2020). They rely, however, on a quantitatively 
accurate model. This chapter explores how experimental or numerical data could be 
assimilated in order to create these quantitatively-accurate models from qualitatively- 
accurate physics-based models or from physics-agnostic models. 


1.3 The Opportunity for Data-Driven Methods in 
Thermoacoustics 


All models contain parameters that are tuned to fit data. These range from qualitatively- 
accurate physics-based models with O(10!) parameters to Gaussian Process surro- 
gate models with O(10*) parameters, and to physics-agnostic neural networks with 
O(10°) parameters. The challenge is to create models that are quantitatively accurate 
with quantified uncertainties and are sufficiently constrained to be informative.' To 
this end, all the approaches in this chapter take a Bayesian perspective and, where 
possible, employ rigorous statistical inference? (MacKay 2003). 


' Freemon Dyson (2004) quoted Fermi quoting von Neumann saying: “With four parameters I can 
fit an elephant, and with five I can make him wiggle his trunk.” Fermi was referring to arbitrary 
parameters rather than physics-based parameters but the general point remains that models can 
become un-informative if they contain too many parameters. 


? As stated in the introduction to this book: “Machine learning is statistical inference using data 
collected or knowledge gained through past targeted studies or real-life experience”. 
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The first example is a canonical thermoacoustic system: the hot wire Rijke tube 
(Rijke 1859; Saito 1965). Although simple and cheap to operate, it is difficult to 
model accurately firstly because the heat release rate is small, meaning that many 
visco-thermal dissipation mechanisms are sufficiently large, in comparison, that they 
must be included in the model, and secondly because the heat release rate fluctuations 
at the wire cannot be measured directly. A hot wire Rijke tube is, however, easy to 
automate, meaning that millions of datapoints can be obtained cheaply and elements 
of the system can be moved easily (Rigas et al. 2016). Physics-based models of 
the Rijke tube can therefore be constructed sequentially, mirroring data assimilation 
from component tests, sector tests, combustor tests, and full engine tests in industry. 
The process (MacKay 2003; Juniper and Yoko 2022) is to: 


1. choose various plausible physics-based models that could explain the data; 

2. tune model parameters by assimilating data from experiments; 

3. quantify the uncertainties in the parameters of each model; 

4. calculate the marginal likelihood (also known as the evidence) for each model, 
and thereby penalise overly-complex models; 

. compare the models against each other and select the best model; 

6. add the next component and assimilate more data, allowing the parameters 

describing the previous components to float within constrained priors. 


Nn 


The second example is the assimilation of DNS and/or experimental data into 
a simplified combustion model, the G-equation (Williams 1985) with around 4000 
degrees of freedom (Hemchandra 2009). Two approaches are demonstrated. The first 
approach assimilates snapshots of the data sequentially with a Kalman filter (Evensen 
2009), refining model parameters on the fly (Yu et al. 2020). The second approach 
assimilates 10 snapshots simultaneously with a Bayesian ensemble of Deep Neural 
Networks (BayNNE) (Pearce et al. 2020). This gives almost the same results as the 
Kalman filter but is around 10° times faster. Both approaches assimilate data into 
physics-based models and obtain the expected values and uncertainties of the model 
parameters. 

The third example is the assimilation of experimental data into physics-agnostic 
models. The models are trained to recognize how close a thermoacoustic system is 
to instability from the noise that it emits (Sengupta et al. 2021; Waxenegger- Wilfing 
etal. 2021; McCartney et al. 2022). As for the first two examples, a Bayesian approach 
is used so that the model can output its certainty about its prediction. This physics- 
agnostic approach is compared with model-based approaches quantified by the Hurst 
exponent (Nair et al. 2014), the permutation entropy (Kobayashi et al. 2017), and the 
autocorrelation decay (Lieuwen and Banaszuk 2005), which are based on a priori 
assumptions of how the noise signal will change as instability approaches. 

Other examples of the application of Machine Learning to Thermoacoustics are 
in learning the nonlinear flame response with Neural Networks (Jaensch and Polifke 
2017; Tathawadekar et al. 2021), identifying nonlinear flame describing functions 
(McCartney et al. 2020), modelling the flame impulse response from LES with a 
Gaussian Process surrogate model (Kulkarni et al. 2021), and the use of Gaussian 
Processes for Uncertainty Quantification (Guo et al. 2021a). 
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2 Physics-Based Bayesian Inference Applied to a Complete 
System 


Physics-based Bayesian inference starts from a set of physics-based candidate models 
H;, each of which has a set of model parameters a. For thermoacoustic systems, 
typical model parameters would be physical dimensions, temperatures, reflection 
coefficients, and a flame transfer function. Data, D, arrive and, at the first level of 
inference, we find the parameters of each model that are most likely to explain the 
data (MacKay 2003, Sect.2.6). For thermoacoustic systems, typical data would be 
temperatures, pressure fluctuations, or natural emission fluctuations. We start from 
the product rule of probability: 


P(a, D/H;) = P(al|D, H;) P (DIH) = P(D\a, Hi) PalH,) d) 


where P(a|H;) is our prior assumption about the probability of the parameters, 
a, given the model H;. Bayesian inference requires us to impose prior values for 
the model parameters and their uncertainties. This is appropriate because we usu- 
ally know the model parameters approximately from previous experiments and will 
become increasingly certain about them as an experimental campaign progresses. 
The term P(D|a, H;) contains the data, D, which is fixed by the experiment, and 
the parameters, a, which we wish to obtain for model H;. For given D, the term 
P(D\|a, Hi) defines the likelihood of the parameters, a, of model H; (MacKay 2003, 
p. 29). This likelihood does not have to sum to 1 because the proposed models H; 
are not mutually exclusive or exhaustive. On the other hand, for a given model H; 
and parameters, a, the term P(D|a, H;) defines the probability of the data, which 
does have to sum to 1. This distinction becomes important when incorporating mea- 
surement noise. 

The term P(D|H,;) is the evidence for the model. This is the RHS of (1) integrated 
(also known as marginalized) over all parameter values: 


PIDIH) = | Pla, 7) Plait) da (2) 


a 


which is known as the marginal likelihood. At the first level of inference, this quantity 
has no significance because we simply find a that maximizes P (a| D, H;) for a given 
model H;. It is used in the second level of inference, in which we compare the 
marginal likelihoods of different candidate models. 

The experiments in this section are performed on a vertical Rijke tube containing 
an electric heater, which is moved through 19 different positions from the bottom 
end of the tube (Juniper and Yoko 2022; Garita et al. 2021; Garita 2021). The heater 
power is set to eight different values until the system reaches steady state. Then a 
loudspeaker at the base of the tube forces the system close to its resonant frequency 
and probe microphones measure the response throughout the tube. We assimilate the 
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decay rates, S,, frequencies, S;, and relative pressures of the microphones, (P,, P;) 
into a thermoacoustic network model. 


2.1 Laplace’s Method 


The most likely parameters, a, and their uncertainties can be found with sampling 
methods such as Markov Chain Monte Carlo (Metropolis et al. 1953; MacKay 2003) 
or Hamiltonian Monte Carlo. These sample the posterior probability distribution 
through a random walk. They can be applied to this thermoacoustic problem (Garita 
2021) but are quite slow. The assimilation process can be accelerated greatly by 
assuming that all the probability distributions are Gaussian (MacKay 2003, Chap. 27). 
The prior probability distribution, which must integrate to 1, is then: 


P(alH;) = 


1 
(a ap)” Cala = a,)| (3) 


1 
exp 
y (27) Ne |Caal | 2 


where N, is the number of parameters, a, are their prior expected values and Caa is 
their prior covariance matrix. We assume that, for a given model H; with parameters 
a, the measurements D are normally-distributed around the model predictions D(a): 


P(D\a, Hi) = 


1 
exp| 5 Dla) D)" Cpp(Dta) ~ D)| (4) 


1 
y (27)%>|Cpp| 


where Np is the number of datapoints and Cpp is a diagonal matrix containing the 
variance of each measurement. In this example, epistemic uncertainty such as model 
error and systematic measurement error is included within Cpp. 

We define J to be the negative log of the RHS of (1): 


J = — log {P (Dla, H;) P(alH;)} (5) 


so that the most probable parameter values, amp, are found by minimizing J using 
an optimization algorithm. The RHS of (1) is the product of two Gaussians (3), (4), 
meaning that the posterior likelihood of the parameters, P (a| D, H,;), is a Gaussian 
centred around amp: 


1 
— log {P(a|D, H;)} = 5a — Amp)’ A (a — amp) + constant (6) 


where A is the inverse of the posterior covariance matrix which, by inspection, is the 
Hessian of J: 
2T 


ðaidj 


Ai = (7) 
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The posterior uncertainty in the parameters, A~!, is therefore calculated cheaply. The 
integral (2), which can be prohibitively expensive to calculate without the Gaussian 
assumption, is now simply: 


P(DIH:) = P(Dlamp, Hi) P (Amp|Hi) (det(A /27))7 t? (8) 


This integral allows us to rank different models, H;. By the product rule of probabil- 
ity P(H;|D)P(D) = P(D|H;) P(H;). If the prior probability, P(H;), is the same 
for each model then the models can be ranked by P(D|H;). The fact that (8) is 
proportional to det(A)~!/? penalizes models for which det(A) is large. This tends to 
favour models with fewer parameters (hence smaller A) even if they do not fit the 
data as well as models with more parameters. This does not, of course, prevent a 
model with many parameters from being the highest ranked, as long as the model 
fits the data well and the measurement uncertainty is small. 


2.2 Accelerating Laplace’s Method with Adjoint Methods 


If all probability distributions are assumed to be Gaussian then J is the sum of 
the squares of the discrepancies between the model predictions and the experimental 
measurements, weighted by our confidence in the experimental measurements, added 
to the sum of the squares of the discrepancies between the model parameters and 
their prior estimates, weighted by our confidence in the prior estimates: 


J = —log{P(DIa, Hi) P(alH;)} 
= (S,(a) — S,)"C5(S;(a) — S,)... 
+ (S;(a) — S)” Cg (Sia) — S;)... 
+ (P,(a) — P,)" Cp, (Pr(a) — Py)... 
+ (Pj (a) — Pi)" Cpi (Pia) — Pi)... 
+ (a—ay,)’Cy(a—ar) t+... (9) 


By inspection, the Jacobian and Hessian of J contain 011/da; and 07L0/daja j respec- 
tively, where L refers to S, (a) and P, (a). These first and second derivatives can be 
found cheaply with first (Magri and Juniper 2013) and second (Tammisola et al. 
2014; Magri et al. 2016) order adjoint methods. The remaining terms in J contain 
the normalizing factors in (3), (4). The derivatives w.r.t. the measurement uncertain- 
ties can also be calculated and one can then optimize to find the measurement noise 
that maximizes the posterior likelihood. In this example, the epistemic uncertainty 
is embedded within the measurement noise, so assimilating the measurement noise 
also assimilates the epistemic uncertainty. 

Adjoint codes require a careful code structure and must avoid non-differentiable 
operators. The code used here consists of a low level thermoacoustic network model 
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that contains floating parameters to quantify all possible local feedback mechanisms 
Juniper (2018). The gradients of (S,, P.) w.r.t. all possible feedback mechanisms 
are calculated. These mechanisms are then ascribed physical meaning by candidate 
models and the gradients w.r.t. each model’s parameters are extracted. The low level 
function is called by a mid-level function that calculates J and all its gradients. In 
turn this is called by a high level function that converges to amp and then calculates 
the likelihoods and marginal likelihoods using Laplace’s method. A separate high 
level function performs Markov Chain Monte Carlo by calling the same mid-level 
and low-level functions. The code is available at Juniper (2022). 


2.3 Applying Laplace’s Method to a Complete 
Thermoacoustic System 


Matveev (2003) set out to create a quantitatively-accurate model of the hot wire 
Rijke tube by compiling quantitatively-accurate models of its components from the 
literature. Despite being tuned to be quantitatively correct at one heater position, this 
carefully-constructed model is only qualitatively correct at nearby heater positions 
(Matveev 2003, Figs. 5-5 to 5-8). This demonstrates the danger of relying on quanti- 
tative models from the literature: these models may have been quantitatively correct 
for the reported experiment, but they are probably only qualitatively correct for other 
experiments. The Bayesian inference demonstrated in this section uses qualitative 
models from the literature but, crucially, allows their parameters to float in order to 
match the new experiment at all operating points. As will be shown later, this creates 
a quantitatively-accurate model over the entire range studied and, if the model is 
physically-correct, it can extrapolate beyond the range studied. 

Developing a quantitatively accurate model of the hot wire Rijke tube is challeng- 
ing because the heat release rate is small and therefore the thermoacoustic driving 
mechanism is weak. For the experiment shown here, which is taken from Juniper 
and Yoko (2022), the thermoacoustic mechanism contributes around +10 rad s~! to 
the growth rate and +100 rad s~! to the frequency. For comparison, Fig. 1 shows the 
decay rate (negative growth rate) and frequency of acoustic oscillations in the cold 
Rijke tube (i) when empty, (ii) with the heater prongs in place, (iii) with the heater 
and prongs in place, and (iv) with the heater, prongs, and thermocouples in place. 
The growth rate and frequency drifts caused by these elements of the rig, even when 
the heater is off, are a similar size to the thermoacoustic effect and cannot be ignored 
in a quantitative model. These elements must be modelled but, even after reading the 
extensive literature on the Rijke tube such as Feldman (1968); Raun et al. (1993); 
Bisio and Rubatto (1999) and the references within them, it is not evident a priori 
which physical mechanisms must be included and which can be neglected. Instead, 
we propose several physics-based models, assimilate the data into those models using 
Laplace’s method combined with adjoint methods, and then select the model with the 
highest marginal likelihood because it is the one that is best supported by the experi- 
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Fig. 1 Expected values (+2 standard deviations) of model predictions D(a) verses experimentally 
measured values (+2 standard deviations) D of the growth rates and frequencies of the cold Rijke 
tube in four configurations: (i) empty tube; (ii) tube containing heater prongs; (iii) tube containing 
heater prongs and heater; (iv) tube containing heater prongs, heater, and thermocouples. Image 
adapted from Juniper and Yoko (2022) 
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Table 1 log(Best Fit Likelihood) per datapoint and log(Marginal Likelihood) per datapoint for 
seven models of the heater prongs in the cold Rijke tube. The second column contains the number 
of parameters in each model. The third column describes how the viscous boundary layer on the 
prongs is modelled: it is the viscous dissipation in the tube’s boundary layer multiplied by a real 
number, a complex number, or zero. The fourth column is the equivalent for the thermal boundary 
layer. If the third and fourth columns are joined then the same factor is used for both the viscous 
and thermal boundary layers. The fifth column notes whether the blockage of the prongs is included 


in the model. Model 4 gives the best fit to the data but is not the most likely model. Model 6 is the 
most likely model (highest marginal likelihood) because it achieves a good data fit with just two 
model parameters. (Table adapted from Juniper and Yoko 2022) 
Model Params Viscous b.l. | Thermal b.l. | Blockage log(BFL) log(ML) 
1 Real No —0.3622 —0.7183 
2 2 Complex No —0.3549 —0.9117 
3 2 Real Real No —0.3529 —0.7683 
4 4 Complex Complex No +0.9360 —0.2949 
5 1 Zero Zero Yes —3.6955 —3.8696 
6 2 Real Yes +0.6781 +0.1010 
7 3 Real Real Yes +0.7099 —0.1096 


mental data. For example, Table 1 shows the best fit likelihood, P(D|ayp, Hi), and 
the marginal likelihood, P(D|H;), for seven candidate models of the heater prongs. 
These models contain various combinations of the viscous boundary layer, the ther- 
mal boundary layer, and the blockage caused by the prongs, as described in the 
caption. The best data fit is achieved by model 4 but the highest marginal likelihood 
is achieved by model 6, which fits the data well with just two parameters. Model 6 
contains the blockage caused by the prongs and the visco-thermal drag of the prong’s 
boundary layers, which is expressed as a real multiple of the visco-thermal drag of 
the tube’s boundary layers. It is re-assuring that the model with the highest marginal 
likelihood contains all the expected physics, but remains simple. 

This process is repeated for the heater itself and the thermocouples (Juniper and 
Yoko 2022) until a quantitatively-accurate model of the cold Rijke tube has been 
created. Figure | shows the model predictions and experimental measurements for 
the final model. This model is quantitatively accurate across the entire operating 
range with just a handful of parameters (Juniper and Yoko 2022). Using Laplace’s 
method, accelerated by first and second order adjoint methods, this data assimilation 
takes a few seconds on a laptop. Using MCMC takes around 1000 times longer on 
a workstation (Garita 2021). Although time-consuming, MCMC can be useful in 
order to confirm that the posterior likelihood distributions are close to Gaussian, 
which justifies the use of Laplace’s method. 

The fluctuating heat release rate at the wire cannot be measured directly. Analyti- 
cal relationships between velocity fluctuations and heat release rate fluctuations have 
been developed (King 1914; Lighthill 1954; Carrier 1955; Merk 1957) but subse- 
quent numerical simulations (Witte and Polifke 2017) have shown that numerically- 
calculated relationships have a more intricate dependence on Re and St than can be 
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Table 2 log(Best Fit Likelihood) per datapoint and log(Marginal Likelihood) per datapoint for 
nine models of the heater in the hot Rijke tube. Model parameters are denoted as k with a numerical 
index. ke are the model parameters from the cold experiments, which are fixed. The second column 
contains the number of parameters in each model. The third and fourth columns describe how the 
magnitude and phase of the fluctuating heat release rate are modelled. Qp is the heater power and 
Q king is adjusted for King’s law King (1914); Juniper and Yoko (2022). The fifth and sixth columns 
describe how the visco-thermal drag at the heater is modelled, where is is the angular frequency 
and qz is Lighthill’s time delay Lighthill (1954). (Table adapted from Juniper and Yoko 2022) 


Model Params Magnitude | Phase Viscous | Thermal | log(BFL) | log(ML) 
(ke = cold | (ke = cold 
value) value) 
1 2 ky x Or | ko ke ke —4.6018 | —4.7039 
2 2 ky x Qn |k xis ke ke —4.4942 | —4.5976 
3 3 ki x OB |k ke ke —4.5567 | —4.6960 
4 2 kı x k2 ke ke —4.5926 | —4.6991 
OKing 
5 2 kı x k2 x is ke ke —4.5670 | —4.6750 
QkKing 
6 2 kı x k2 x isty | ke ke —5.6770 | —5.7794 
Q King 
7 4 ky x Qn |k xis ke +(k3 + | ke —3.3439 | —3.6113 
ik4) x Qn 
8 6 ki x Qn | ko xis ke +(k3 + | ke +(ks 3.1981 | —3.5952 
ik4) x Qn | ike) x Qn 
9 6 ki x Qn |k2 ke +(k3 + | ke +(ks 3.5735 | —3.9589 
ik4) x Qn | ike) x Qn 


derived analytically. Since the 1970s (Bayly 1986) therefore, researchers have tended 
to use CFD simulations or simple relations that are tuned to a particular operating 
point (Witte 2018, Table 1; Ghani et al. 2020). 

Here we propose six candidate models for the heat release rate and two candi- 
date models for how the thermo-viscous drag of the heater changes with the heater 
power. We then calculate the marginal likelihoods of these models, allowing the 
measurement noise to float in order to accommodate epistemic uncertainty such as 
systematic measurement error and model error. Table 2 shows the candidate models, 
their assimilated parameters, their log best fit likelihood (BFL) per datapoint, and 
their log marginal likelihood per datapoint. Model 8 has the highest Marginal Like- 
lihood. In this model, the fluctuating heat release rate is proportional to the steady 
heat release rate; the time delay between velocity perturbations and subsequent heat 
release rate perturbations is the same for all configurations, and the thermo-viscous 
drag of the heater element is proportional to the heater power. There is, of course, no 
limit to the number of models that can be tested. The interested reader is encouraged 
to generate and test their own models using the code (Juniper 2022). 
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Fig.2 Expected values of model 8’s predictions D(a) verses experimental measurements D of the 
growth rates and frequencies of the hot Rijke tube, as a function of heater power and heater position. 
The model parameters are obtained by assimilating data from all 105 experimental configurations. 


The model is quantitatively-accurate over the entire operating range. (Image adapted from Juniper 
and Yoko 2022) 


Figure 2 shows the experimental measurements verses the predictions of model 8 
for the growth rates and frequencies when assimilating data from all 105 experiments. 
The agreement is excellent, particularly for the growth rate, which is more practi- 
cally important than the frequency. Figure 3 is the same as Fig. 2 but is obtained by 
assimilating data from just 8 of the 105 experiments. The results are almost indistin- 
guishable, which shows that, once a good physics-based model has been identified, 
very little data is required to tune its parameters. This model can then extrapolate 
to other operating points, even if they are far from those already examined. This is 
a desirable feature of any model and shows the advantage of assimilating data into 
physics-based models with a handful of parameters, rather than physics-agnostic 
models with many parameters, which would not be able to extrapolate. 

As a final comment, this assimilation of experimental data with rigorous Bayesian 
inference forces the experimentalist to design informative experiments. Firstly, with- 
out an excellent initial guess for the parameter values, it is almost impossible to 
assimilate all the parameters simultaneously. This encourages the experimentalist 
to assimilate the parameters sequentially with an experimental campaign in which 
some of the parameters take known values (usually zero) in some of the experiments. 
Secondly, this process reveals systematic measurement error that was previously 
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Fig.3 As for Fig. 2 but when the model parameters are obtained by assimilating data from the eight 
circled configurations. This model is also quantitatively accurate over the entire operating range, 
showing that this model can extrapolate beyond the assimilated datapoints. (Image adapted from 
Juniper and Yoko 2022) 


unknown to the experimentalist. This epistemic error is revealed when the parameters 
shift to absorb the error and seem to uncover impossible physical behaviour.? Once 
this systematic measurement error becomes known, the experimentalist is forced to 
remove it or avoid it with good experimental design. 


3 Physics-Based Statistical Inference Applied to a Flame 


The most influential element of any thermoacoustic system is the response of the 
flame to acoustic forcing. This is also the hardest element to model. In this section, 
experimental images of forced flames are assimilated into a physics-based model 
using the first level of inference described in Sect. 2. The physics-based model can 
then be used in thermoacoustic analysis for example (i) in nonlinear simulations, (ii) 
to create a nonlinear flame describing function (FDF), or (iii) to create a linear flame 
transfer function (FTF). 


3 As the OPERA team found in 2012, it is wise to search for systematic error before publishing 
results, however eye-catching they seem (Brumfiel 2012). 
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3.1 Assimilating Experimental Data with an Ensemble 
Kalman Filter 


We take our model, H, to be a partial differential equation (PDE) discretized onto a 
grid, with unknown parameters a. As before, we wish to infer the unknown param- 
eters, a, by assimilating data, D, from an experiment. The model, which has state 
w, is marched forwards in time from some initial condition to produce a model 
prediction, D(y), that can be compared with the experimental measurements, D, 
over some time period T. In principle, it is possible to use the same method as in 
Sect.2.1 to iterate to the values of a that minimize an appropriate J for all the data 
simultaneously. This requires the model predictions, D(w), and their gradients w.r.t. 
all parameters, a;, to be stored at all moments at which they are compared with the 
data D. This is not practical because it would require too much storage. This section 
describes an alternative approach that requires less storage. 

We consider a level set model of a premixed laminar flame, taken from Yu et al. 
(2020). The state, y, is the flame position, and the parameters, a, are the flame 
aspect ratio 6, the Markstein length L, the ratio, K, between the mean flow speed 
and the phase speed of perturbations down the flame, the amplitude, €, of velocity 
perturbations, and the parabolicity parameter, œ of the base flow, where U/U = 
1+ad1— 2(r/ R)*). The parameters f, L, and «œ are inferred from an image of an 
unforced steady premixed bunsen flame. This flame is then forced at 200, 300, and 
400 Hz, and the data, D, are experimental images taken at 2800 Hz. The state, y, is 
marched forward in time by the model, H, with parameters a to an assimilation step. 
At the assimilation step, the model prediction D(yw) is compared with the data D, 
and the state y and remaining parameters a are both updated to statistically optimal 
estimates, as described in the next paragraph. The state, yw, is then marched forward 
to the next assimilation step and the process is repeated until the parameters a have 
converged. 

If the evolution were linear or weakly nonlinear then a Kalman filter or extended 
Kalman filter would be appropriate. The evolution is highly nonlinear, however, with 
wrinkles and cusps forming at the flame. We therefore use an ensemble Kalman filter 
(EnKF) in which we generate an ensemble of N states y; from the model H with 
different parameter values a; (Evensen 2009). At each assimilation step, we append 
each parameter vector a; to its state vector y; to form an augmented state W;. The 
expected value V and covariance Cyw of the augmented state Y are then derived 
from the ensemble: 


eI 
ll 


1 N 
TAn (10) 


1 č - o 
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The expected value Ų becomes the prior expected value and replaces a, in (3). 
The covariance Cyy becomes the prior expected covariance and replaces Caa in 
(3). The predicted flame position Dw) is found from the expected state, Y. The 
discrepancy between the experimental flame position D and the model prediction 
D(y) is then combined with an estimate of the measurement error Cpp in (4). The 
posterior augmented state Y,,, and its inverse covariance A is calculated to be that 
which maximizes the RHS of (1), as in Sect.2.1. The state y and parameters a 
are extracted from the expected value of the posterior augmented state. N states are 
created with this posterior expected value and covariance, and the process is repeated. 

Figure 4 shows the RMS discrepancy between the experiments, D, and the 
expected value of the simulations, D(y), for flames forced at three different frequen- 
cies. The EnKF is switched on from time periods 10 to 15. The RMS discrepancy 
drops by more than one order of magnitude during this time, to a floor set by the 
model error. The largest drops in discrepancy occur when the EnKF is assimilat- 
ing data just as a bubble of unburnt gases is pinching off from the flame. During 
these moments, which are relatively rare, the parameters converge rapidly towards 
their final values. This shows that relatively rare events contain more information 
than relatively common events, as is quantified, for example, through the Shannon 
information content of an event (MacKay 2003, Eq. (2.34)). After 5 time periods the 
EnKF is switched off and the tuned models evolve for a further 3 periods without 
assimilating data. Figure 5 shows the models’ expected values and uncertainties (yel- 
low) and the experimental measurements (black) for one further period. This shows 
that the EnKF has successfully assimilated the model parameters from the experi- 


RMS error 


time [periods] 


Fig. 4 Root-mean-square (RMS) discrepancy between experimental data, D, and model predic- 
tions, D, for a conical bunsen flame forced at 200, 300 and 400 Hz (blue/orange/green, respectively). 
Data is assimilated from the experiments into the model (DA) between 10 and 15 periods. The snap- 
shots shown in Fig. 5 are taken from the grey window. Image taken from Yu et al. (2020) 
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Fig. 5 Snapshots of log-normalized likelihood over one forcing period after combined state and 
parameter estimation for 200, 300 and 400 Hz (top/middle/bottom row, respectively). Highly likely 
positions of the flame surface are shown in yellow; less likely positions in green. The flame surface 
from experimental images is shown as black dots. Image taken from Yu et al. (2020) 
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mental images and that simulations with these parameters remain accurate beyond 
the assimilation period. 

The EnKF has the advantages that (i) no calculations are required before the 
assimilation process begins, (ii) it can assimilate any experimental flame that can be 
represented by the model H. It has the disadvantages that (i) it cannot run in real 
time because the computational time of the simulations, O(10!) seconds, exceeds 
the time between assimilation steps, O(107°) seconds; (ii) if the ensemble starts far 
from the data, the ensemble tends to diverge rather than converge to the experimental 
results. 


3.2 Assimilating with a Bayesian Neural Network Ensemble 


The two disadvantages of the EnKF can be overcome, while retaining uncertainty esti- 
mates, by assimilating data, D, with a Bayesian Neural Network ensemble (BayNNE) 
(Pearce et al. 2020; Gal 2016; Sengupta et al. 2020). Each Neural Network, M;, in 
the ensemble is a repeated composition of the function f (W;x + b;) where f is a 
nonlinear function, x are the inputs, W; is a matrix of weights, and b; is a vector of 
biases. Together W; and b; comprise the parameters 6; of each neural network. The 
set of all parameters in the ensemble is denoted {6;}. The posterior state, Y (D, {6;}), 
contains the predicted parameters (e.g. 6, L, K, €, a) of the numerical simulation. 
The true targets, a, are the actual parameters of the simulations. The distribution 
of the prediction is assumed to be Gaussian: P(W|D, {6;}) = N Cw, Cww). Creating 
this prediction means learning the mean W (D, {6;}) and the covariance Cyy (D, {6;}) 
of the ensemble. 
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Each NN in the ensemble produces the expected value, uz; (D, 0;), and covariance, 
o?(D, 0i), of a Gaussian distribution by minimising the loss function: 


Ji = (a — u) E (a — m) + log(|Z;')) (12) 
+(6; = Oiane) Dae (6; = Oiane) (13) 

where 
E~! = diag(o?F) (14) 


and 6; anc are the initial weights and biases of the i‘ h NN. These are sampled from 
the prior distribution P(0) = N(0, X,), where ©, = diag(1/Nz), where Ny is 
the number of units in each hidden layer. The above task is time-consuming but is 
performed just once. 

The ensemble therefore contains a set of Gaussians, each with their own means, 
Hi, and covariances, a. These are approximated by a single Gaussian with mean 
W(D, {9;}) and covariance Cy (D, {6;}) using (Lakshminarayanan et al. 2017): 


n ie 
PD, {6:}) = =D mi(D. 61) (15) 
i=1 
Cuw(D, {6;}) = diag(eww (D, {6;})) (16) 


where N is the number of NNs in the ensemble and 


ee (eee 1 Š ? 
cva (D, (8) =) 07D, 0) + > 2 m O, 6) — ( J wD, o) 
i=l i=l i=l 


(17) 


The uncertainty of the ensemble therefore contains the average uncertainty of its 
members, combined with uncertainty arising from the distribution of the means of 
the ensemble members. If this uncertainty is large, the observed data is likely to have 
been outside the training data. This task is quick and is performed at each operating 
condition. 

The BayNNE is trained on 8500 simulations of the level set solver used in Sect. 3.1. 
The parameters varied are the flame aspect ratio £, the Markstein length L, the ratio, 
K, between the mean flow speed and the phase speed of perturbations down the 
flame, the amplitude of velocity perturbations, €, the mean flow parabolicity, œ, and 
the Strouhal number, St. The parameters are sampled using quasi-Monte Carlo in 
order to obtain good coverage of the parameter space within fixed ranges. For each 
simulation, 200 evenly-spaced snapshots of a forced periodic solution are stored. The 
data, D, used for training takes the form of 10 consecutive snapshots extracted from 
these images. The total library of data therefore consists of 8500 x 200 = 1.7 x 10° 
sets of data, D, each with known parameters a. The neural networks are trained to 
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Fig. 6 Top row: experimental images of one cycle of an acoustically forced conical Bunsen flame; 
the left half a shows the raw image while the right half b shows the detected edge. Bottom row: 
the flame edge and its uncertainty when assimilated into a G-equation model with an EnKF (c) and 
a BayNNE (d). With this model, propagation of perturbations down the flame is captured well but 
the pinch-off event is not. Image adapted from Croci et al. (2021) 


recognize the parameter values a from the data D. Training takes around 12 hours 
per NN on an NVIDIA P100 GPU. Recognizing the parameter values takes O(10~7) 
seconds on an Intel Core i7 processor on a laptop, which is sufficiently fast to work 
in real time. 

The top row of Fig.6 shows 10 snapshots of a forced bunsen flame experiment 
alongside the automatically-detected flame edge. The bottom row shows the modelled 
flame edge and its variance, assimilated with the EnKF (left) and the BayNNE (right). 
The flame edge is shown in black. As expected, the expected values found with both 
assimilation methods are almost identical. The prediction is close to the experiments 
but, because of model error, the EnKF and the BayNNE both struggle to fit the most 
extreme pinch off event at 0.67. The uncertainty in the BayNNE is greater than 
that of the EnKF because it assimilates just 10 flame images, while the EnKF has 
assimilated over 500 images by the time this sequence is generated. Alternative NN 
architectures, such as long-short term memory networks may be able to reduce this 
uncertainty. 

The fact that the BayNNE assimilates just 10 snapshots is a disadvantage when 
the flame behaviour is periodic over many cycles, as in the previous example, but an 
advantage when the flame behaviour is intermittent, as in the next example. Inter- 
mittency is commonly observed in thermoacoustic systems, particularly when they 
are close to a subcritical bifurcation to instability (Juniper and Sujith 2018; Nair 
et al. 2014). Bursts of periodic behaviour are interspersed within moments of quasi- 
stochastic behaviour and, while these can be identified by eye and with recurrence 
plots (Juniper and Sujith 2018), they are not sufficiently regular to be assimilated 
with the EnKF. 
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In the next example, images of a bluff-body stabilized turbulent premixed flame 
Paxton et al. (2019, 2020) are recorded at 10 kHz using OH PLIF, and the flame edge 
is extracted and smoothed to remove the turbulent wrinkles. A BayNNE trained on 10 
snapshots of G-equation simulations with 2400 combinations of parameters, a, then 
identifies the most likely parameters from 10 observed snapshots. In this example 
the model contains an extra parameter: the spatial growth rate, n, of perturbations, 

Figure 7 shows the five assimilated parameters, (K, €, 7, St, 6) and their uncer- 
tainties during 430 timesteps of an experimental run imaged at 2.5 kHz. During 
this run, there are four to five oscillation cycles. The BayNNE successfully identi- 
fies the G-equation parameters that match the experimental results and, importantly, 
estimates their uncertainties. At four moments during the run, Fig.7 shows snap- 
shots of the experimental image (top left quadrant) alongside the expected value and 
uncertainty from the G-equation simulations. Because the G-equation simulation is 
physics-based, it can extrapolate beyond the window viewed in the experiments, as 
shown in the images. The distribution of fluctuating heat release rate, with its uncer- 
tainty, can be calculated from the model. This can then be expressed as a spatial 
distribution of the flame interaction index, n, and the flame time delay, t, as in Fig. 8, 
which can then be entered into a thermoacoustic network model or Helmholtz solver. 


4 Identifying Precursors to Thermoacoustic Instability 
with BayNNEs 


The noise from a thermoacoustically-stable turbulent combustor has broadband char- 
acteristics and is often assumed to be stochastic (Clavin et al. 1994; Burnley and 
Culick 2000). This assumption is a reasonable starting point for stochastic analysis 
(Clavin et al. 1994) but does not exploit the fact that combustion noise contains useful 
information about the system’s proximity to thermoacoustic instability (Juniper and 
Sujith 2018, Sect.4). Analysis of this noise usually involves a statistical measure to 
detect transition away from stochastic behaviour. This can be a measure of departure 
from chaotic behaviour, using techniques for analysing dynamical systems (Gotoda 
et al. 2012; Sarkar et al. 2016; Murugesan and Sujith 2016), or the detection of pre- 
cursors such as intermittency (Juniper and Sujith 2018; Nair et al. 2014; Scheffer 
et al. 2009). 

These methods quantify the behaviour that a researcher thinks should be impor- 
tant, based on observation of similar systems. This approach is generally applicable 
but has the disadvantage that it will miss information that the researcher does not 
think is important, and cannot extract information that is peculiar to a particular 
engine. Given that this research is motivated by industrial applications in which sev- 
eral nominally-identical models of the same engine are deployed, it makes sense 
to extract as much information as possible from that particular engine model. In 
other words, we ask whether machine learning techniques can learn to recognize 
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Fig. 7 Assimilated parameters (K, €, n, St, 8) of a G-equation model of a bluff-body-stabilized 
premixed flame during a sequence of 428 snapshots. The parameters are assimilated with a Bayesian 
Neural Network Ensemble (BayNNE), which also estimates the uncertainty in the assimilated 
values. The four flame images show (top-left of each frame) the detected flame edge from the 
experimental OH PLIF image and (remainder of each frame) the expected values and uncertainties 
in the G-equation model prediction. Image adapted from Croci et al. (2021) 
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Fig. 8 Spatial distribution of n and t derived from the G-equation model of the bluff-body- 
stabilized premixed flame shown in Fig. 7 
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precursors on one set of engines and then identify precursors on another set of 
nominally-identical engines. Further we ask whether the machine learning approach 
is better than techniques that use a statistical measure. In this section, we examine a 
laboratory scale combustor to develop the method, then three aeroplane engine fuel 
injector nozzles in an intermediate pressure rig, and then 15 full scale commercial 
aeroplane engines. 


4.1 Laboratory Combustor 


In the first study we place a 1 kW turbulent premixed flame inside a steel tube with 
length 800 mm and diameter 80 mm (Sengupta et al. 2021). The system is run at 900 
different operating conditions varying power, equivalence ratio, fuel composition, 
and the tube exit area. All operating points are thermoacoustically stable, but the 
thermoacoustic mechanism is active and some points are close to thermoacoutic 
instability. 

For each operating point, the combustion noise is recorded at 10, 000 Hz. The 
system is then forced for 50 ms at 230 Hz, which is close to the natural frequency of 
the first longitudinal mode. The decay rate of the acoustic oscillations is extracted 
from the microphone signal. We then train a Bayesian Neural Network ensemble 
(BayNNE) to identify the decay rate from 300 ms clips of combustion noise before 
the acoustic excitation. The decay rate changes from negative to positive at the point 
of thermoacoustic instability, so is a good measure of the proximity to thermoacoustic 
instability. The BayNNE returns the uncertainty in its predictions, ensuring that the 
model does not make overconfident predictions from inputs that differ significantly 
from those on which it was trained. If the priors are specified correctly, this technique 
can work with smaller amounts of data and be more resistant to over-fitting (Pearce 
et al. 2020). 

Before training, all the input variables are normalized in order to remove the 
amplitude information. The parameters a; of each ensemble member are initialized 
by drawing from a Gaussian prior distribution with zero mean and variance equal to 
1/Ny, where Ny is the number of hidden nodes in the previous layer of the NN. 
This initialization means that the distribution of predictions made by the untrained 
prior neural network will be approximately zero-centred with unit variance. Each 
ensemble member is trained normally, but with a modified loss function that anchors 
the parameters to their initial values. This procedure approximates the true posterior 
distribution for wide neural networks (Pearce et al. 2020). We train on 80% of the 
operating points, retain 20% for testing, and train ten different models using ten 
random test-train splits. This ensures the stability of our algorithm’s performance 
with respect to different train-test splits. 

Figure 9a shows the decay timescale (the reciprocal of the decay rate) predicted 
by the BayNNE, compared with the decay timescale measured from the subsequent 
response to the pulse. The grey bars show the uncertainty outputted by the BayNNE. 
The decay timescales are predicted reasonably accurately. The grey uncertainty bars 
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Fig.9 a Decay timescale, +2 standard deviations, predicted with a BayNNE, b Hurst exponent ¢ 
autocorrelation decay, as functions of the measured decay timescale for thermoacousic oscillations 
of a turbulent premixed Bunsen flame in a tube. The BayNNE provides the most reliable indicator 
of proximity to thermoacoustic instability. This figure recreated is based on the data in Sengupta 
et al. (2021) 


widen for the operating points closer to instability because there are only a few 
operating points close to instability; the decay timescale exceeds 0.3 s for just 13 
operating points in the training set. This shows that the BayNNE can successfully 
predict how far the system is from instability while also indicating how confident it 
is in that prediction. 

Figure 9b, c show the generalized Hurst exponent and the Autocorrelation decay of 
the combustion noise as functions of the measured decay timescale. As expected, the 
Hurst exponent drops and the autocorrelation decay increases as the decay timescale 
increases, showing that these measurements are working as precursors of combustion 
instability. They are not as accurate, however, as the BayNNE and contain no measure 
of uncertainty. It is clear therefore that, when trained on this specific combustor, the 
BayNNE out-performs the Hurst exponent and autocorrelation decay. This outcome 
would be reversed, of course, if the BayNNE were applied to a different combustor, 
without retraining. 

We also trained the BayNNEs to recognize the equivalence ratio and burner power 
from 300 ms of combustion noise. The BayNNe could recognize the equivalence 
ratio with a rms error of 3.5% and the power with a rms error of 2%. This shows 
that each operating condition has a unique acoustic signature that the BayNNE can 
learn. The experimentalist in the room can hear that all operating conditions sound 
slightly different, but cannot recognize the operating condition to the accuracy that 
the BayNNE can achieve. 


4.2 Intermediate Pressure Industrial Fuel Spray Nozzle 


The second study is on an industrial intermediate pressure combustion test rig, which 
is equipped with three pressure transducers, sampling at 50 kHz. Experiments are 
performed on three different fuel injectors over a range of operating points in order 
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Fig. 10 The black line shows the thermoacoustic instability threshold as a function of air-fuel 
ratio (AFR) and exit temperature 739 for three aeroplane engine fuel injectors in an intermediate 
pressure rig. The coloured lines show the distance to the black line. Injectors 1a and 1b are nominally 
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Fig. 11 a Hurst exponent, b autocorrelation decay, c permutation entropy calculated from the 
pressure signal of injector la in the intermediate pressure rig, as a function of the distance to the 
instability threshold in Fig. 10a. A positive (negative) distance indicates stable (unstable) thermoa- 
coustic behaviour 


to identify operating points that are thermoaoustically unstable. The injectors are 
labelled la, 1b, and 2. Injectors la and 1b are nominally identical. The operating 
points are identified by their air-fuel ratio (AFR) and their exit temperature (739). The 
threshold of thermoacoustic instability is defined as the operating points at which 
the acoustic amplitude exceeds 0.5% of the static pressure. The black lines in Fig. 10 
show this threshold in (AFR, 739 )—space. Despite being nominally identical, injectors 
la and 1b have instability thresholds at slightly different positions in (AFR,730)— 
space. 

We normalize the ranges of AFR and 739 to run from 0 to 1 and then train a 
BayNNE to recognize the Euclidian distance to the instability threshold, based on 
500 ms of normalized pressure measurements. Stable points are assigned positive 
distances and unstable points are assigned negative distances. We compare the predic- 
tions from the BayNNE with those from the autocorrelation decay, the permutation 
entropy, and the Hurst exponent. 

Figure | la—c show the Hurst exponent, the autocorrelation decay, and the permu- 
tation entropy for injector 1a. The Hurst exponent reduces significantly as the system 
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Fig. 12 Predicted distance to the instability threshold +2 s.d. as a function of measured distance 
to instability threshold for a injector la, b injector 1b, c injector 2 when the prediction is obtained 
from a BayNNE trained on injector la. Injectors la and 1b are nominally identical 


becomes unstable and this is a useful indicator of the instability threshold, albeit with 
significant unquantified uncertainty. The autocorrelation decay tends towards zero 
as the system becomes more unstable but, for this data, barely changes across the 
instability threshold and therefore does not provide a useful indicator for the thresh- 
old. The permutation entropy drops after the system has crossed the threshold from 
stable to unstable operation, meaning that it is not useful as a precursor in these 
experiments. 

The BayNNE is trained on the training points of la and then applied to test points 
of la, 1b, and 2. Figure 12a shows the distance from instability threshold predicted 
by the BayNNE compared with the true distance. Uncertainty bands of the BayNNE 
are shown in grey. The BayNNE provides a remarkably accurate prediction of the 
distance to instability from the pressure signal alone. Figure 12b shows the distance 
from the instability threshold predicted by the BayNNE trained on injector 1a when 
applied to the pressure data from the nominally identical injector 1b. The BayNNE 
performs well when the system is unstable (distance less than 0) but performs less 
well, and assigns itself greater uncertainty, when the system is stable (distance greater 
than 0). As mentioned above, 1b is unstable over a different range of (AFR,/T30)-— 
space than la, and, despite this difference, the BayNNE has successfully identified 
the distance to instability on the new injector. The prediction is most inaccurate and 
uncertain, however, when the system is stable, which is the most useful scenario 
because it then acts as a precursor to instability. Figure 12c shows the distance from 
the instability threshold predicted by the BayNNE trained on la when applied to 
the pressure data from injector 2. The BayNNE performs badly, particularly when 
the system is stable. This confirms that a BayNNE trained on one thermoacous- 
tic system is a good indicator of thermoacoustic precursors on nominally-identical 
thermoacoustic systems, out-performing statistical measures, but is not useful for dif- 
ferent systems. This is not surprising, given that the BayNNE is using all available 
information from the pressure signal of this particular system, while the statistical 
methods are quantifying general features in all systems. 
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4.3 Full Scale Aeroplane Engine 


The third study is on 15 full scale prototype aeroplane engines operating at sea 
level (McCartney et al. 2022). The engines are equipped with two dynamic pres- 
sure sensors upstream of the combustor, sampling at 25 kHz. The compressor exit 
temperature, compressor exit pressure, fuel flow rate, primary/secondary fuel split, 
and core speed are sampled at 20 Hz. The core speed is increased (known as a ramp 
acceleration) such that the engines deliberately enter a thermoacoustically-unstable 
operating region. The instability threshold is defined by the point at which the peak 
to peak amplitude exceeds a certain value. Although the engines are nominally iden- 
tical, the instability threshold is exceeded at a different core speed for each engine. 
Here, we investigate whether a BayNNE trained on the operating points and pres- 
sure signals from some of the engines can provide a useful warning of impending 
instability during a ramp acceleration in the other engines. 

Previously we used BayNNEs to predict the decay rate of acoustic oscillations 
(Sect. 4.1) or the distance to instability in parameter space (Sect. 4.2). Now we con- 
sider a more practical quantity: the probability that the combustor will become ther- 
moacoustically unstable within the next At seconds during a ramp acceleration. We 
assume that this probability depends on the current operating point of the system, the 
future operating point, and the time it will take to reach the future operating point. In 
line with Sects. 4.1 and 4.2 we also assume that the current pressure signal contains 
useful information about how close the combustor is to thermoacoustic instability. 
We downsample the signal from a single sensor to 25 kHz, extract 4096 datapoints, 
which corresponds to around 160 ms, and then process it: (i) into a binary indica- 
tion of whether the peak to peak pressure threshold has been exceeded; (ii) with 
de-trended fluctuation analysis (DFA) (Gotoda et al. 2012). The BayNNE is trained 
to learn the binary signal at time Aż in the future, based on the operating conditions 
at time Aż in the future and the pressure signal in the present. The future time, At, 
is varied from 100 ms to 1000 ms in steps of 100 ms. For comparison, a BayNNE 
is trained to learn the binary signal at time Ar in the future, based on the operating 
conditions alone (i.e. without the pressure data). 

There are three stages: tuning, training, and testing. In the tuning stage the number 
of hidden layers (2-10) and number of neurons in each layer (10—100) are optimized 
by performing a random search over these hyperparameters. For each combination, 
a BayNNE is trained on the training data and evaluated on the tuning data. We 
then select the hyperparameters and number of training epochs that perform best. 
In the testing stage, the BayNNE with optimal hyperparameters is applied to the 
testing data. This outputs the log likelihood of the BayNNE model, M, given the 
data D. The different BayNNEs can then be ranked by the relative sizes of their log 
likelihoods. (The absolute value is not important.) 

Figure 13 shows the log likelihoods of the BayNNE trained on the operating point 
(OP) alone and the BayNNE trained on the operating point and the DFA pressure 
signal (DFA). The OP BayNNE is the baseline against which to compare the DFA 
BayNNE. For future times below 400 ms, the tuned DFA BayNNE model fits the 
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Fig. 13 The log likelihood of observing this data, given (i) the BayNNE trained on the operating 
point alone (OP) and (ii) the BayNNE trained on the operating point and DFA pressure signal (OP & 
DFA). For prediction horizons lower than 400 ms, inclusion of the pressure signal renders the model 
more likely and therefore more predictive. This figure is recreated based on the data in McCartney 
et al. (2022) 


binary signal at that future time better than the tuned OP BayNNE model. In other 
words, the inclusion of pressure data gives smaller errors in the predicted probability 
that the threshold will be exceeded at that future time. For future times above 
400 ms, the tuned DFA BayNNE model is marginally less likely than the OP 
BayNNE. This shows that the current pressure signal contains information that is 
useful up to 400 ms into the future, but no longer. 

Figure 14 shows the error in the predicted core speed at which the system becomes 
unstable. The OP BayNNE knows only the future operating point. The error in the 
predicted onset core speed arises from differences between the engines being tested. If 
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Fig. 14 Mean error in the predicted core-speed at which the engine will become thermoacoustically 
unstable as a function of time to instability onset as predicted by the BayNNE trained on the OP 
alone, and the BayNNE trained on the OP and the DFA pressure signal. This figure is recreated 
based on the data in McCartney et al. (2022) 
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all the engines were to behave identically, this error would be zero. The DFA BayNNE 
knows the future operating point and the current pressure signal. As expected from 
Fig. 13, the error in the predicted onset core speed drops around 400 ms before 
the instability starts. In other words, in this ramp acceleration, the pressure signal 
becomes informative around 400 ms before an instability starts but is not informative 
before then. 


5 Conclusion 


In the late 1990s, we were promised that the internet would change everything. 
Three decades later, very few internet-only companies have survived. The winners 
have been the companies who integrated the internet into what they did well already. 
If Machine Learning is to science what the internet was to business then the fields that 
thrive will be those that integrate machine learning into what they do well already. 
Fluid Dynamics in general, and Thermoacoustics in particular, is well placed to do 
this because the methods work well and the industrial motivation is strong. 
Machine learning is successful because of its relentless focus on data, rather 
than on models, correlations, and assumptions that the research community has 
become used to. These models are not badly wrong, but they are rarely quantitatively 
accurate, and are therefore of limited use for design. It is particularly powerful to 
combine these physics-based models with one of the tools of probabilistic machine 
learning: Bayesian inference. By assimilating experimental or numerical data, we 
can turn qualitatively-accurate models into quantitatively accurate models, quantify 
their uncertainty, and rank the evidence for each model given the data. This should 
become standard practice at the intersection between low order models and experi- 
ments (numerical or physical). The days of sketching a line by eye through a cloud 
of points on a 2D plot should be over. This should be replaced by rigorous Bayesian 
inference, with all subjectivity well-defined, and in as many dimensions as required. 
For low order models, assimilation with Laplace’s method combined with first 
and second order adjoints of those models is fast and powerful. For models with 
more than a few hundred degrees of freedom, this method becomes cumbersome. 
Nevertheless, it is still possible to assimilate data into larger physics-based models 
and to estimate the uncertainty in their parameters using iterative methods such as the 
Ensemble Kalman Filter, or parameter recognition with Bayesian Neural Network 
Ensembles. This is a powerful way to combine the practical aspects of Machine 
Learning with the attractive aspects of physics-based models. It is demonstrated here 
for a simple level set solver but, with enough simulations, could be extended to CFD. 
Sometimes, however, we must accept that we do not recognise or cannot model 
the influential physical mechanisms in a system we are observing. In these circum- 
stances, physics-agnostic neural networks are an ideal tool because they can learn 
to recognise features that humans will miss. Perhaps the most striking conclusion 
of the experiment reported in Sect.4.1 is that every operating point had a different 
sound and that a Neural Network could recognise the operating point just from that 
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sound. A human may suspect this but would be unable to remember them all. This 
is an interesting feature for aircraft engines because fleets contains thousands of 
nominally-identical but slightly different engines. The signs of impending thermoa- 
coustic instability can therefore be learned from the sound on a handful of engines 
and applied confidently to the others. This gives a way to avoid thermoacoustic 
instability, even if it has been impossible to design it out. 

For thermoacoustics, this chapter shows some promising ways to combine 30 
years of machine learning with 200 years of physics-based learning. If we continue 
to fly long distance or send rockets into space, we will need to continue to avoid 
thermoacoustic instability. With novel research methods and continual industrial 
motivation, the field of thermoacoustics looks set to be interesting for many decades 
to come. 
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Summary 


The increasing availability of data is a shared trait of several research fields. It opens 
up great opportunities to advance our understanding of physical processes and lead 
to disruptive technological innovations. Machine learning methods are becoming an 
essential resource in combustion science to deal with previously unmet challenges 
in the field, associated with the number of species involved in combustion processes, 
the small scales and the non-linear turbulence-chemistry interactions characterising 
the behaviour of combustion devices. Turbulent reacting flows are inherently multi- 
scale and multi-physics and involve a broad range of scales, both for chemistry and 
fluid dynamics. Unlike typical machine learning applications that rely on inexpensive 
system evaluations, combustion involves experiments that may be difficult to repeat 
(especially at the scale of interest) and simulations on high-performance computing 
infrastructures. Contrary to common intuition, available combustion data are very 
sparse: massive datasets are available, but for very few operating conditions (in 
terms of chemical composition, turbulence level, turbulence/chemistry interactions, 
etc.), resulting in generalisation of machine learning algorithms to be a challenging 
task. This leads to specialised needs that have pushed the research into developing 
hybrid physics-based, data-driven methods for combustion applications. This book 
stems from this observation to present current trends for ML methods in combustion 
research, in particular: 


1. The use of machine learning to understand and learn reaction rates and reaction 
mechanisms and accelerate chemistry integration. 

2. The use of linear and non-linear dimensionality reduction approaches for feature 
extraction, classification and development of reduced-order models (ROMs). 

3. The combination of advanced neural network architectures and physics-based 
models to parametrise the unresolved quantities of interest in reacting flow sim- 
ulations and to model, forecast, and avoid thermoacoustic instabilities. 

4. The development of data-based frameworks designed to detect spatial and tem- 
poral events of interest during high-performance computing (HPC) simulations. 
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340 Summary 


This book explored the growing intersection of machine learning methods with 
physics-based modelling for turbulent combustion problems. Without the ambition 
of being exhaustive, it gathered contributions from international experts in the field, 
covering a variety of problems and application areas. As such, it offers a snapshot of 
the current trends in the community and discusses potential developments forward. 
Looking ahead, the main challenge for data-driven approaches applied to combustion 
will be to demonstrate the interpretability, explainability, and generalizability of the 
proposed modelling strategies in practical applications. This is critical to implement- 
ing major technological modifications and leading the technological transformation 
towards sustainable combustion technologies based on renewable fuels, including 
E-fuels. We are certain that this field will advance rapidly in the near future and we 
hope that the information presented in this volume would contribute towards that 
development and specifically help curious readers. 
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